parallel load balancing: Topics by Science.gov

Sample records for parallel load balancing

Dynamic load balancing of applications

DOEpatents

Wheat, Stephen R.

1997-01-01

An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Dynamic load balancing of applications

DOEpatents

Wheat, S.R.

1997-05-13

An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Load balancing for massively-parallel soft-real-time systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hailperin, M.

1988-09-01

Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Data decomposition method for parallel polygon rasterization considering load balancing

NASA Astrophysics Data System (ADS)

Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun

2015-12-01

It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

NASA Technical Reports Server (NTRS)

Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.

2000-01-01

The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)

1999-01-01

This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
Parallel Processing of Adaptive Meshes with Load Balancing

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
Data Partitioning and Load Balancing in Parallel Disk Systems

NASA Technical Reports Server (NTRS)

Scheuermann, Peter; Weikum, Gerhard; Zabback, Peter

1997-01-01

Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible waves, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Load Balancing Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pearce, Olga Tkachyshyn

2014-12-01

The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1999-01-01

The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations

NASA Technical Reports Server (NTRS)

Chrisochoides, Nikos

1995-01-01

We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers

DTIC Science & Technology

1991-07-01

DTIC ~ ELECTE OCT 2 41991 AD-A242 045 Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers’ Edoardo S. Biagioni Jan F. Prins...Department of Computer Science University of North Carolina Chapel Hill, N.C. 27599-3175 USA biagioni @cs.unc.edu prinsOcs.unc.edu Abstract Scan Directed...MasPar Computer Corpora- tion. Bibliography [1] Edoardo S. Biagioni . Scan Directed Load Balancing. PhD thesis., University of North Carolina, Chapel Hill
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Load Balancing Unstructured Adaptive Grids for CFD Problems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid

1996-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

2013-07-01

In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Applying graph partitioning methods in measurement-based dynamic load balancing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhatele, Abhinav; Fourestier, Sebastien; Menon, Harshitha

Load imbalance leads to an increasing waste of resources as an application is scaled to more and more processors. Achieving the best parallel efficiency for a program requires optimal load balancing which is a NP-hard problem. However, finding near-optimal solutions to this problem for complex computational science and engineering applications is becoming increasingly important. Charm++, a migratable objects based programming model, provides a measurement-based dynamic load balancing framework. This framework instruments and then migrates over-decomposed objects to balance computational load and communication at runtime. This paper explores the use of graph partitioning algorithms, traditionally used for partitioning physical domains/meshes, formore » measurement-based dynamic load balancing of parallel applications. In particular, we present repartitioning methods developed in a graph partitioning toolbox called SCOTCH that consider the previous mapping to minimize migration costs. We also discuss a new imbalance reduction algorithm for graphs with irregular load distributions. We compare several load balancing algorithms using microbenchmarks on Intrepid and Ranger and evaluate the effect of communication, number of cores and number of objects on the benefit achieved from load balancing. New algorithms developed in SCOTCH lead to better performance compared to the METIS partitioners for several cases, both in terms of the application execution time and fewer number of objects migrated.« less

PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)

1998-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
A Domain Decomposition Parallelization of the Fast Marching Method

NASA Technical Reports Server (NTRS)

Herrmann, M.

2003-01-01

In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Load Balancing in Stochastic Networks: Algorithms, Analysis, and Game Theory

DTIC Science & Technology

2014-04-16

SECURITY CLASSIFICATION OF: The classic randomized load balancing model is the so-called supermarket model, which describes a system in which...P.O. Box 12211 Research Triangle Park, NC 27709-2211 mean-field limits, supermarket model, thresholds, game, randomized load balancing REPORT...balancing model is the so-called supermarket model, which describes a system in which customers arrive to a service center with n parallel servers according
A tool for simulating parallel branch-and-bound methods

NASA Astrophysics Data System (ADS)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Unstructured Adaptive Grid Computations on an Array of SMPs

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Load Balancing Strategies for Multi-Block Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)

2002-01-01

The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Parallel simulation today

NASA Technical Reports Server (NTRS)

Nicol, David; Fujimoto, Richard

1992-01-01

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Multiprocessing the Sieve of Eratosthenes

NASA Technical Reports Server (NTRS)

Bokhari, S.

1986-01-01

The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

1999-01-01

The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jiang; Guo, Hanqi; Yuan, Xiaoru

Particle tracing is a fundamental technique in flow field data visualization. In this work, we present a novel dynamic load balancing method for parallel particle tracing. Specifically, we employ a constrained k-d tree decomposition approach to dynamically redistribute tasks among processes. Each process is initially assigned a regularly partitioned block along with duplicated ghost layer under the memory limit. During particle tracing, the k-d tree decomposition is dynamically performed by constraining the cutting planes in the overlap range of duplicated data. This ensures that each process is reassigned particles as even as possible, and on the other hand the newmore » assigned particles for a process always locate in its block. Result shows good load balance and high efficiency of our method.« less
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fattebert, J.-L.; Richards, D.F.; Glosli, J.N.

2012-12-01

We present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to 440·10 6 particles on 65,536 MPI tasks.
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate

NASA Technical Reports Server (NTRS)

Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel

1994-01-01

This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Multidimensional spectral load balancing

DOEpatents

Hendrickson, Bruce A.; Leland, Robert W.

1996-12-24

A method of and apparatus for graph partitioning involving the use of a plurality of eigenvectors of the Laplacian matrix of the graph of the problem for which load balancing is desired. The invention is particularly useful for optimizing parallel computer processing of a problem and for minimizing total pathway lengths of integrated circuits in the design stage.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

PubMed

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Internet traffic load balancing using dynamic hashing with flow volume

NASA Astrophysics Data System (ADS)

Jo, Ju-Yeon; Kim, Yoohwan; Chao, H. Jonathan; Merat, Francis L.

2002-07-01

Sending IP packets over multiple parallel links is in extensive use in today's Internet and its use is growing due to its scalability, reliability and cost-effectiveness. To maximize the efficiency of parallel links, load balancing is necessary among the links, but it may cause the problem of packet reordering. Since packet reordering impairs TCP performance, it is important to reduce the amount of reordering. Hashing offers a simple solution to keep the packet order by sending a flow over a unique link, but static hashing does not guarantee an even distribution of the traffic amount among the links, which could lead to packet loss under heavy load. Dynamic hashing offers some degree of load balancing but suffers from load fluctuations and excessive packet reordering. To overcome these shortcomings, we have enhanced the dynamic hashing algorithm to utilize the flow volume information in order to reassign only the appropriate flows. This new method, called dynamic hashing with flow volume (DHFV), eliminates unnecessary flow reassignments of small flows and achieves load balancing very quickly without load fluctuation by accurately predicting the amount of transferred load between the links. In this paper we provide the general framework of DHFV and address the challenges in implementing DHFV. We then introduce two algorithms of DHFV with different flow selection strategies and show their performances through simulation.
Method of up-front load balancing for local memory parallel processors

NASA Technical Reports Server (NTRS)

Baffes, Paul Thomas (Inventor)

1990-01-01

In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of Input/Output and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kodavasal, Janardhan; Harms, Kevin; Srivastava, Priyesh

A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, andmore » in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.« less
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh

NASA Astrophysics Data System (ADS)

Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru

2017-11-01

We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
Three-phase Power Flow Calculation of Low Voltage Distribution Network Considering Characteristics of Residents Load

NASA Astrophysics Data System (ADS)

Wang, Yaping; Lin, Shunjiang; Yang, Zhibin

2017-05-01

In the traditional three-phase power flow calculation of the low voltage distribution network, the load model is described as constant power. Since this model cannot reflect the characteristics of actual loads, the result of the traditional calculation is always different from the actual situation. In this paper, the load model in which dynamic load represented by air conditioners parallel with static load represented by lighting loads is used to describe characteristics of residents load, and the three-phase power flow calculation model is proposed. The power flow calculation model includes the power balance equations of three-phase (A,B,C), the current balance equations of phase 0, and the torque balancing equations of induction motors in air conditioners. And then an alternating iterative algorithm of induction motor torque balance equations with each node balance equations is proposed to solve the three-phase power flow model. This method is applied to an actual low voltage distribution network of residents load, and by the calculation of three different operating states of air conditioners, the result demonstrates the effectiveness of the proposed model and the algorithm.

Parallelized reliability estimation of reconfigurable computer networks

NASA Technical Reports Server (NTRS)

Nicol, David M.; Das, Subhendu; Palumbo, Dan

1990-01-01

A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.
Load Balancing Strategies for Multiphase Flows on Structured Grids

NASA Astrophysics Data System (ADS)

Olshefski, Kristopher; Owkes, Mark

2017-11-01

The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Dynamic load balancing for petascale quantum Monte Carlo applications: The Alias method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sudheer, C. D.; Krishnan, S.; Srinivasan, A.

Diffusion Monte Carlo is the most accurate widely used Quantum Monte Carlo method for the electronic structure of materials, but it requires frequent load balancing or population redistribution steps to maintain efficiency and avoid accumulation of systematic errors on parallel machines. The load balancing step can be a significant factor affecting performance, and will become more important as the number of processing elements increases. We propose a new dynamic load balancing algorithm, the Alias Method, and evaluate it theoretically and empirically. An important feature of the new algorithm is that the load can be perfectly balanced with each process receivingmore » at most one message. It is also optimal in the maximum size of messages received by any process. We also optimize its implementation to reduce network contention, a process facilitated by the low messaging requirement of the algorithm. Empirical results on the petaflop Cray XT Jaguar supercomputer at ORNL showing up to 30% improvement in performance on 120,000 cores. The load balancing algorithm may be straightforwardly implemented in existing codes. The algorithm may also be employed by any method with many near identical computational tasks that requires load balancing.« less
Tile-based Level of Detail for the Parallel Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niski, K; Cohen, J D

Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
Improve load balancing and coding efficiency of tiles in high efficiency video coding by adaptive tile boundary

NASA Astrophysics Data System (ADS)

Chan, Chia-Hsin; Tu, Chun-Chuan; Tsai, Wen-Jiin

2017-01-01

High efficiency video coding (HEVC) not only improves the coding efficiency drastically compared to the well-known H.264/AVC but also introduces coding tools for parallel processing, one of which is tiles. Tile partitioning is allowed to be arbitrary in HEVC, but how to decide tile boundaries remains an open issue. An adaptive tile boundary (ATB) method is proposed to select a better tile partitioning to improve load balancing (ATB-LoadB) and coding efficiency (ATB-Gain) with a unified scheme. Experimental results show that, compared to ordinary uniform-space partitioning, the proposed ATB can save up to 17.65% of encoding times in parallel encoding scenarios and can reduce up to 0.8% of total bit rates for coding efficiency.
Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Model Code

NASA Technical Reports Server (NTRS)

Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos

1996-01-01

An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.

NASA Technical Reports Server (NTRS)

Oliker, Leonid

1998-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Short-term Power Load Forecasting Based on Balanced KNN

NASA Astrophysics Data System (ADS)

Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei

2018-03-01

To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
BALANCING THE LOAD: A VORONOI BASED SCHEME FOR PARALLEL COMPUTATIONS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steinberg, Elad; Yalinewich, Almog; Sari, Re'em

2015-01-01

One of the key issues when running a simulation on multiple CPUs is maintaining a proper load balance throughout the run and minimizing communications between CPUs. We propose a novel method of utilizing a Voronoi diagram to achieve a nearly perfect load balance without the need of any global redistributions of data. As a show case, we implement our method in RICH, a two-dimensional moving mesh hydrodynamical code, but it can be extended trivially to other codes in two or three dimensions. Our tests show that this method is indeed efficient and can be used in a large variety ofmore » existing hydrodynamical codes.« less
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

NASA Technical Reports Server (NTRS)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
DNS load balancing in the CERN cloud

NASA Astrophysics Data System (ADS)

Reguero Naredo, Ignacio; Lobato Pardavila, Lorena

2017-10-01

Load Balancing is one of the technologies enabling deployment of large-scale applications on cloud resources. A DNS Load Balancer Daemon (LBD) has been developed at CERN as a cost-effective way to balance applications accepting DNS timing dynamics and not requiring persistence. It currently serves over 450 load-balanced aliases with two small VMs acting as master and slave. The aliases are mapped to DNS subdomains. These subdomains are managed with DDNS according to a load metric, which is collected from the alias member nodes with SNMP. During the last years, several improvements were brought to the software, for instance: support for IPv6, parallelization of the status requests, implementing the client in Python to allow for multiple aliases with differentiated states on the same machine or support for application state. The configuration of the Load Balancer is currently managed by a Puppet type. It discovers the alias member nodes and gets the alias definitions from the Ermis REST service. The Aiermis self-service GUI for the management of the LB aliases has been produced and is based on the Ermis service above that implements a form of Load Balancing as a Service (LBaaS). The Ermis REST API has authorisation based in Foreman hostgroups. The CERN DNS LBD is Open Software with Apache 2 license.
Analytical study of pressure balancing in gas film seals

NASA Technical Reports Server (NTRS)

Zuk, J.

1973-01-01

The load factor is investigated for subsonic and choked flow conditions, laminar and turbulent flows, and various seal entrance conditions. Both parallel sealing surfaces and surfaces with small linear deformation were investigated. The load factor for subsonic flow depends strongly on pressure ratio; under choked flow conditions, however the load factor is found to depend more strongly on film thickness and flow entrance conditions rather than pressure ratio. The importance of generating hydrodynamic forces to keep the seal balanced under severe and multipoint operation is also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamblin, T; de Supinski, B R; Schulz, M

Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transformmore » and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.« less
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm

NASA Astrophysics Data System (ADS)

Küchlin, Stephan; Jenny, Patrick

2018-06-01

Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
A Framework to Analyze the Performance of Load Balancing Schemes for Ensembles of Stochastic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Tae-Hyuk; Sandu, Adrian; Watson, Layne T.

2015-08-01

Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, andmore » where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model are consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25 %, and the total processor idle time by 85 %.« less

Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

NASA Astrophysics Data System (ADS)

Adnan; Sato, Mitsuhisa

Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines

NASA Technical Reports Server (NTRS)

1999-01-01

Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei

Tensor contractions represent the most compute-intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions makes them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce a novel approach that avoids data redistribution in contracting symmetric tensors while also avoiding redundant storage and maintaining load balance. We present experimental results on two parallel supercomputers for several symmetric contractions that appear in the CCSD quantum chemistry method. We also present a novel approach to tensor redistribution thatmore » can take advantage of parallel hyperplanes when the initial distribution has replicated dimensions, and use collective broadcast when the final distribution has replicated dimensions, making the algorithm very efficient.« less
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Scan line graphics generation on the massively parallel processor

NASA Technical Reports Server (NTRS)

Dorband, John E.

1988-01-01

Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

PubMed Central

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers.

PubMed

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

NASA Astrophysics Data System (ADS)

Furuichi, M.; Nishiura, D.

2015-12-01

Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture

NASA Technical Reports Server (NTRS)

Eidson, T. M.; Erlebacher, G.

1994-01-01

While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

PubMed

Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik

2008-03-01

Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions

NASA Astrophysics Data System (ADS)

Furuichi, Mikito; Nishiura, Daisuke

2017-10-01

We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1997-01-01

Mesh adaption is a powerful tool for efficient unstructured- grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.
Performance tradeoffs in static and dynamic load balancing strategies

NASA Technical Reports Server (NTRS)

Iqbal, M. A.; Saltz, J. H.; Bokhart, S. H.

1986-01-01

The problem of uniformly distributing the load of a parallel program over a multiprocessor system was considered. A program was analyzed whose structure permits the computation of the optimal static solution. Then four strategies for load balancing were described and their performance compared. The strategies are: (1) the optimal static assignment algorithm which is guaranteed to yield the best static solution, (2) the static binary dissection method which is very fast but sub-optimal, (3) the greedy algorithm, a static fully polynomial time approximation scheme, which estimates the optimal solution to arbitrary accuracy, and (4) the predictive dynamic load balancing heuristic which uses information on the precedence relationships within the program and outperforms any of the static methods. It is also shown that the overhead incurred by the dynamic heuristic is reduced considerably if it is started off with a static assignment provided by either of the other three strategies.
Performance Analysis and Portability of the PLUM Load Balancing System

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1998-01-01

The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
Energy to the Edge (E2E) U.S. Army Rapid Equipping Force

DTIC Science & Technology

2014-03-21

generators, parallel multiple sources, prioritize loads, and balance loads. Smart grids are based on complex algorithms and controls. 3. Reduce...stations are not able to be s rviced by prim power because of their location in the middle of a very active airfield and fueling a syst m that c ist
Load Forecasting Based Distribution System Network Reconfiguration -- A Distributed Data-Driven Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard

In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, themore » proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.« less
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing

NASA Technical Reports Server (NTRS)

Sterling, T. L.; Zima, H. P.

2002-01-01

Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.

Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Evaluating SPLASH-2 Applications Using MapReduce

NASA Astrophysics Data System (ADS)

Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu

MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

NASA Astrophysics Data System (ADS)

Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

2018-05-01

In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

NASA Astrophysics Data System (ADS)

Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

2018-01-01

In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Dynamic load balance scheme for the DSMC algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jin; Geng, Xiangren; Jiang, Dingwu

The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors.

PubMed

Hines, Michael L; Eichner, Hubert; Schürmann, Felix

2008-08-01

Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Parallel deterministic neutronics with AMR in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clouse, C.; Ferguson, J.; Hendrickson, C.

1997-12-31

AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Database Reorganization in Parallel Disk Arrays with I/O Service Stealing

NASA Technical Reports Server (NTRS)

Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard

1996-01-01

We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.
Algorithms for parallel flow solvers on message passing architectures

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1995-01-01

The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.
I/O load balancing for big data HPC applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, Arnab K.; Goyal, Arpit; Wang, Feiyi

High Performance Computing (HPC) big data problems require efficient distributed storage systems. However, at scale, such storage systems often experience load imbalance and resource contention due to two factors: the bursty nature of scientific application I/O; and the complex I/O path that is without centralized arbitration and control. For example, the extant Lustre parallel file system-that supports many HPC centers-comprises numerous components connected via custom network topologies, and serves varying demands of a large number of users and applications. Consequently, some storage servers can be more loaded than others, which creates bottlenecks and reduces overall application I/O performance. Existing solutionsmore » typically focus on per application load balancing, and thus are not as effective given their lack of a global view of the system. In this paper, we propose a data-driven approach to load balance the I/O servers at scale, targeted at Lustre deployments. To this end, we design a global mapper on Lustre Metadata Server, which gathers runtime statistics from key storage components on the I/O path, and applies Markov chain modeling and a minimum-cost maximum-flow algorithm to decide where data should be placed. Evaluation using a realistic system simulator and a real setup shows that our approach yields better load balancing, which in turn can improve end-to-end performance.« less
Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Accelerating semantic graph databases on commodity clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

DTIC Science & Technology

2017-04-13

modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak

2001-01-01

The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Lumped transmission line avalanche pulser

DOEpatents

Booth, R.

1995-07-18

A lumped linear avalanche transistor pulse generator utilizes stacked transistors in parallel within a stage and couples a plurality of said stages, in series with increasing zener diode limited voltages per stage and decreasing balanced capacitance load per stage to yield a high voltage, high and constant current, very short pulse. 8 figs.
Lumped transmission line avalanche pulser

DOEpatents

Booth, Rex

1995-01-01

A lumped linear avalanche transistor pulse generator utilizes stacked transistors in parallel within a stage and couples a plurality of said stages, in series with increasing zener diode limited voltages per stage and decreasing balanced capacitance load per stage to yield a high voltage, high and constant current, very short pulse.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory

NASA Astrophysics Data System (ADS)

Challacombe, Matt

2000-06-01

A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.

Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
mGrid: A load-balanced distributed computing environment for the remote execution of the user-defined Matlab code

PubMed Central

Karpievitch, Yuliya V; Almeida, Jonas S

2006-01-01

Background Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. Results mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Conclusion Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet. PMID:16539707
mGrid: a load-balanced distributed computing environment for the remote execution of the user-defined Matlab code.

PubMed

Karpievitch, Yuliya V; Almeida, Jonas S

2006-03-15

Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, Andrew J.

1990-01-01

While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix

NASA Astrophysics Data System (ADS)

Gan, Chee Kwan; Challacombe, Matt

2003-05-01

Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
Numerical computation of solar neutrino flux attenuated by the MSW mechanism

NASA Astrophysics Data System (ADS)

Kim, Jai Sam; Chae, Yoon Sang; Kim, Jung Dae

1999-07-01

We compute the survival probability of an electron neutrino in its flight through the solar core experiencing the Mikheyev-Smirnov-Wolfenstein effect with all three neutrino species considered. We adopted a hybrid method that uses an accurate approximation formula in the non-resonance region and numerical integration in the non-adiabatic resonance region. The key of our algorithm is to use the importance sampling method for sampling the neutrino creation energy and position and to find the optimum radii to start and stop numerical integration. We further developed a parallel algorithm for a message passing parallel computer. By using an idea of job token, we have developed a dynamical load balancing mechanism which is effective under any irregular load distributions
Parallel performance optimizations on unstructured mesh-based simulations

DOE PAGES

Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...

2015-06-01

This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
nem_spread Ver. 5.10

DOE Office of Scientific and Technical Information (OSTI.GOV)

HENNIGAN, GARY; SHADID, JOHN; SJAARDEMA, GREGORY

2009-06-08

Nem_spread reads it's input command file (default name nem_spread.inp), takes the named ExodusII geometry definition and spreads out the geometry (and optionally results) contained in that file out to a parallel disk system. The decomposition is taken from a scalar Nemesis load balance file generated by the companion utility nem_slice.
Multi-jagged: A scalable parallel spatial partitioning algorithm

DOE PAGES

Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...

2015-03-18

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR

NASA Astrophysics Data System (ADS)

Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen

2013-03-01

Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
Dynamic programming on a shared-memory multiprocessor

NASA Technical Reports Server (NTRS)

Edmonds, Phil; Chu, Eleanor; George, Alan

1993-01-01

Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Implicit schemes and parallel computing in unstructured grid CFD

NASA Technical Reports Server (NTRS)

Venkatakrishnam, V.

1995-01-01

The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
Parallel rendering

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Balancing Conflicting Requirements for Grid and Particle Decomposition in Continuum-Lagrangian Solvers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray

2015-10-30

The load balancing strategies for hybrid solvers that involve grid based partial differential equation solution coupled with particle tracking are presented in this paper. A typical Message Passing Interface (MPI) based parallelization of grid based solves are done using a spatial domain decomposition while particle tracking is primarily done using either of the two techniques. One of the techniques is to distribute the particles to MPI ranks to whose grid they belong to while the other is to share the particles equally among all ranks, irrespective of their spatial location. The former technique provides spatial locality for field interpolation butmore » cannot assure load balance in terms of number of particles, which is achieved by the latter. The two techniques are compared for a case of particle tracking in a homogeneous isotropic turbulence box as well as a turbulent jet case. We performed a strong scaling study for more than 32,000 cores, which results in particle densities representative of anticipated exascale machines. The use of alternative implementations of MPI collectives and efficient load equalization strategies are studied to reduce data communication overheads.« less
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis

NASA Technical Reports Server (NTRS)

Lim, Chieng-Fai

1991-01-01

The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
Performance analysis of parallel branch and bound search with the hypercube architecture

NASA Technical Reports Server (NTRS)

Mraz, Richard T.

1987-01-01

With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations

PubMed Central

Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka

2011-01-01

Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Analysis of series resonant converter with series-parallel connection

NASA Astrophysics Data System (ADS)

Lin, Bor-Ren; Huang, Chien-Lan

2011-02-01

In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.

Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Atul K.

The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
A Proposal for Kelly CriterionBased Lossy Network Compression

DTIC Science & Technology

2016-03-01

warehousing and data mining techniques for cyber security. New York (NY): Springer; 2007. p. 83–108. 34. Münz G, Li S, Carle G. Traffic anomaly...p. 188–196. 48. Kim NU, Park MW, Park SH, Jung SM, Eom JH, Chung TM. A study on ef- fective hash-based load balancing scheme for parallel nids. In
Comparing the Performance of Two Dynamic Load Distribution Methods

NASA Technical Reports Server (NTRS)

Kale, L. V.

1987-01-01

Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
An efficient dynamic load balancing algorithm

NASA Astrophysics Data System (ADS)

Lagaros, Nikos D.

2014-01-01

In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

PubMed Central

Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

2014-01-01

Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.

PubMed

Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

2014-05-01

Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.
Parallel Performance Optimizations on Unstructured Mesh-based Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas

2015-01-01

© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cachemore » efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Large scale cardiac modeling on the Blue Gene supercomputer.

PubMed

Reumann, Matthias; Fitch, Blake G; Rayshubskiy, Aleksandr; Keller, David U; Weiss, Daniel L; Seemann, Gunnar; Dössel, Olaf; Pitman, Michael C; Rice, John J

2008-01-01

Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Neural simulations on multi-core architectures.

PubMed

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures

PubMed Central

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
Grid computing in large pharmaceutical molecular modeling.

PubMed

Claus, Brian L; Johnson, Stephen R

2008-07-01

Most major pharmaceutical companies have employed grid computing to expand their compute resources with the intention of minimizing additional financial expenditure. Historically, one of the issues restricting widespread utilization of the grid resources in molecular modeling is the limited set of suitable applications amenable to coarse-grained parallelization. Recent advances in grid infrastructure technology coupled with advances in application research and redesign will enable fine-grained parallel problems, such as quantum mechanics and molecular dynamics, which were previously inaccessible to the grid environment. This will enable new science as well as increase resource flexibility to load balance and schedule existing workloads.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
[Parallel virtual reality visualization of extreme large medical datasets].

PubMed

Tang, Min

2010-04-01

On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms

NASA Astrophysics Data System (ADS)

Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François

2015-10-01

Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
Time Warp Operating System, Version 2.5.1

NASA Technical Reports Server (NTRS)

Bellenot, Steven F.; Gieselman, John S.; Hawley, Lawrence R.; Peterson, Judy; Presley, Matthew T.; Reiher, Peter L.; Springer, Paul L.; Tupman, John R.; Wedel, John J., Jr.; Wieland, Frederick P.;

1993-01-01

Time Warp Operating System, TWOS, is special purpose computer program designed to support parallel simulation of discrete events. Complete implementation of Time Warp software mechanism, which implements distributed protocol for virtual synchronization based on rollback of processes and annihilation of messages. Supports simulations and other computations in which both virtual time and dynamic load balancing used. Program utilizes underlying resources of operating system. Written in C programming language.

Mobile Thread Task Manager

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

2013-01-01

The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures

NASA Technical Reports Server (NTRS)

Ma, Kwan-Liu

1995-01-01

As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
Parallel processing approach to transform-based image coding

NASA Astrophysics Data System (ADS)

Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

1991-06-01

This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
Parallel adaptive wavelet collocation method for PDEs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

2015-10-01

A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
Towards scalable Byzantine fault-tolerant replication

NASA Astrophysics Data System (ADS)

Zbierski, Maciej

2017-08-01

Byzantine fault-tolerant (BFT) replication is a powerful technique, enabling distributed systems to remain available and correct even in the presence of arbitrary faults. Unfortunately, existing BFT replication protocols are mostly load-unscalable, i.e. they fail to respond with adequate performance increase whenever new computational resources are introduced into the system. This article proposes a universal architecture facilitating the creation of load-scalable distributed services based on BFT replication. The suggested approach exploits parallel request processing to fully utilize the available resources, and uses a load balancer module to dynamically adapt to the properties of the observed client workload. The article additionally provides a discussion on selected deployment scenarios, and explains how the proposed architecture could be used to increase the dependability of contemporary large-scale distributed systems.
Reduced temperature hydrolysis at 134 °C before thermophilic anaerobic digestion of waste activated sludge at increasing organic load.

PubMed

Gianico, A; Braguglia, C M; Cesarini, R; Mininni, G

2013-09-01

The performance of thermophilic digestion of waste activated sludge, either untreated or thermal pretreated, was evaluated through semi-continuous tests carried out at organic loading rates in the range of 1-3.7 kg VS/m(3)d. Although the thermal pretreatment at T=134 °C proved to be effective in solubilizing organic matter, no significant gain in organics degradation was observed. However, the digestion of pretreated sludge showed significant soluble COD removal (more than 55%) whereas no removal occurred in control reactors. The lower the initial sludge biodegradability, the higher the efficiency of thermal pretreated digestion was observed, in particular as regards higher biogas and methane production rates with respect to the parallel untreated sludge digestion. Heat balance of the combined thermal hydrolysis/thermophilic digestion process, applied on full-scale scenarios, showed positive values for direct combustion of methane. In case of combined heat and power generation, attractive electric energy recoveries were obtained, with a positive heat balance at high load. Copyright © 2013. Published by Elsevier Ltd.
Parallelization Issues and Particle-In Codes.

NASA Astrophysics Data System (ADS)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

PubMed Central

Bajaj, Chandrajit

2009-01-01

Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
An architecture for real-time vision processing

NASA Technical Reports Server (NTRS)

Chien, Chiun-Hong

1994-01-01

To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids

NASA Astrophysics Data System (ADS)

Ma, Xinrong; Duan, Zhijian

2018-04-01

High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

PubMed

Aprà, E; Kowalski, K

2016-03-08

In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Breaking Barriers to Low-Cost Modular Inverter Production & Use

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bogdan Borowy; Leo Casey; Jerry Foshage

2005-05-31

The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, J; Ma, X; Singh, K

2008-10-09

With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
Dynamics of a split torque helicopter transmission

NASA Technical Reports Server (NTRS)

Rashidi, Majid; Krantz, Timothy

1992-01-01

A high reduction ratio split torque gear train has been proposed as an alternative to a planetary configuration for the final stage of a helicopter transmission. A split torque design allows a high ratio of power-to-weight for the transmission. The design studied in this work includes a pivoting beam that acts to balance thrust loads produced by the helical gear meshes in each of two parallel power paths. When the thrust loads are balanced, the torque is split evenly. A mathematical model was developed to study the dynamics of the system. The effects of time varying gear mesh stiffness, static transmission errors, and flexible bearing supports are included in the model. The model was demonstrated with a test case. Results show that although the gearbox has a symmetric configuration, the simulated dynamic behavior of the first and second compound gears are not the same. Also, results show that shaft location and mesh stiffness tuning are significant design parameters that influence the motions of the system.
Load balance in total knee arthroplasty: an in vitro analysis.

PubMed

El-Hawary, Ron; Roth, Sandra E; King, Graham J W; Chess, David G; Johnson, James A

2006-09-01

One of the goals of total knee arthroplasty (TKA) is to balance the loads between the compartments of the knee. An instrumented load cell that measures compartment loads in real time is utilized to evaluate conventional, qualitative methods of achieving this balance. TKA was performed on 10 cadaveric knees. Prior to and after load balancing, compartment forces were measured at flexion angles of 0-90 degrees. Knees were randomly assigned into one of two groups, based upon whether or not the surgeons could visualize the load cell's output during balancing. Prior to attempting load balance, there were significant differences between the medial and lateral compartment loads for all knees (p < 0.05). After attempting balance with the aid of the load cell, there was equal load balance at all angles studied. Without the aid of the load cell, balance was not consistently achieved at every angle. Conventional load balancing techniques in TKA are not perfect. Copyright 2006 John Wiley & Sons, Ltd.
New Parallel computing framework for radiation transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Distributed computing for membrane-based modeling of action potential propagation.

PubMed

Porras, D; Rogers, J M; Smith, W M; Pollard, A E

2000-08-01

Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

PubMed

Hielscher, Andreas H; Bartel, Sebastian

2004-02-01

Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.

Community Detection on the GPU

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order ofmore » magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.« less
Parallel Programming Strategies for Irregular Adaptive Applications

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
Self-Avoiding Walks Over Adaptive Triangular Grids

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1999-01-01

Space-filling curves is a popular approach based on a geometric embedding for linearizing computational meshes. We present a new O(n log n) combinatorial algorithm for constructing a self avoiding walk through a two dimensional mesh containing n triangles. We show that for hierarchical adaptive meshes, the algorithm can be locally adapted and easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the runtime partitioning and load balancing of adaptive unstructured grids.
A comparison of queueing, cluster and distributed computing systems

NASA Technical Reports Server (NTRS)

Kaplan, Joseph A.; Nelson, Michael L.

1993-01-01

Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
Variable Acceleration Force Calibration System (VACS)

NASA Technical Reports Server (NTRS)

Rhew, Ray D.; Parker, Peter A.; Johnson, Thomas H.; Landman, Drew

2014-01-01

Conventionally, force balances have been calibrated manually, using a complex system of free hanging precision weights, bell cranks, and/or other mechanical components. Conventional methods may provide sufficient accuracy in some instances, but are often quite complex and labor-intensive, requiring three to four man-weeks to complete each full calibration. To ensure accuracy, gravity-based loading is typically utilized. However, this often causes difficulty when applying loads in three simultaneous, orthogonal axes. A complex system of levers, cranks, and cables must be used, introducing increased sources of systematic error, and significantly increasing the time and labor intensity required to complete the calibration. One aspect of the VACS is a method wherein the mass utilized for calibration is held constant, and the acceleration is changed to thereby generate relatively large forces with relatively small test masses. Multiple forces can be applied to a force balance without changing the test mass, and dynamic forces can be applied by rotation or oscillating acceleration. If rotational motion is utilized, a mass is rigidly attached to a force balance, and the mass is exposed to a rotational field. A large force can be applied by utilizing a large rotational velocity. A centrifuge or rotating table can be used to create the rotational field, and fixtures can be utilized to position the force balance. The acceleration may also be linear. For example, a table that moves linearly and accelerates in a sinusoidal manner may also be utilized. The test mass does not have to move in a path that is parallel to the ground, and no re-leveling is therefore required. Balance deflection corrections may be applied passively by monitoring the orientation of the force balance with a three-axis accelerometer package. Deflections are measured during each test run, and adjustments with respect to the true applied load can be made during the post-processing stage. This paper will present the development and testing of the VASC concept.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Three dimensional adaptive mesh refinement on a spherical shell for atmospheric models with lagrangian coordinates

NASA Astrophysics Data System (ADS)

Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael

2007-07-01

One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Hui; Shi, Yanjun

A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
Effects of Demand Response on Retail and Wholesale Power Markets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chassin, David P.; Kalsi, Karanjit

2012-07-26

Demand response has grown to be a part of the repertoire of resources used by utilities to manage the balance between generation and load. In recent years, advances in communications and control technology have enabled utilities to consider continuously controlling demand response to meet generation, rather than the other way around. This paper discusses the economic applications of a general method for load resource analysis that parallels the approach used to analyze generation resources and uses the method to examine the results of the US Department of Energy’s Olympic Peninsula Demonstration Testbed. A market-based closed-loop system of controllable assets ismore » discussed with necessary and sufficient conditions on system controllability, observability and stability derived.« less
Stability improvement of a four cable-driven parallel manipulator using a center of mass balance system

NASA Astrophysics Data System (ADS)

Salafian, Iman; Stewart, Blake; Newman, Matthew; Zygielbaum, Arthur I.; Terry, Benjamin

2017-04-01

A four cable-driven parallel manipulator (CDPM), consisting of sophisticated spectrometers and imagers, is under development for use in acquiring phenotypic and environmental data over an acre-sized crop field. To obtain accurate and high quality data from the instruments, the end effector must be stable during sensing. One of the factors that reduces stability is the center of mass offset of the end effector, which can cause a pendulum effect or undesired tilt angle. The purpose of this work is to develop a system and method for balancing the center of mass of a 12th-scale CDPM to minimize vibration that can cause error in the acquired data. A simple method for balancing the end effector is needed to enable end users of the CDPM to arbitrarily add and remove sensors and imagers from the end effector as their experiments may require. A Center of Mass Balancing System (CMBS) is developed in this study which consists of an adjustable system of weights and a gimbal for tilt mitigation. An electronic circuit board including an orientation sensor, wireless data communication, and load cells was designed to validate the CMBS. To measure improvements gained by the CMBS, several static and dynamic experiments are carried out. In the experiments, the dynamic vibrations due to the translational motion and static orientation were measured with and without CMBS use. The results show that the CMBS system improves the stability of the end-effector by decreasing vibration and static tilt angle.
A Baseline Load Schedule for the Manual Calibration of a Force Balance

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Gisler, R.

2013-01-01

A baseline load schedule for the manual calibration of a force balance was developed that takes current capabilities at the NASA Ames Balance Calibration Laboratory into account. The load schedule consists of 18 load series with a total of 194 data points. It was designed to satisfy six requirements: (i) positive and negative loadings should be applied for each load component; (ii) at least three loadings should be applied between 0 % and 100 % load capacity; (iii) normal and side force loadings should be applied at the forward gage location, the aft gage location, and the balance moment center; (iv) the balance should be used in UP and DOWN orientation to get axial force loadings; (v) the constant normal and side force approaches should be used to get the rolling moment loadings; (vi) rolling moment loadings should be obtained for 0, 90, 180, and 270 degrees balance orientation. Three different approaches are also reviewed that may be used to independently estimate the natural zeros of the balance. These three approaches provide gage output differences that may be used to estimate the weight of both the metric and non-metric part of the balance. Manual calibration data of NASA s MK29A balance and machine calibration data of NASA s MC60D balance are used to illustrate and evaluate different aspects of the proposed baseline load schedule design.
A performance study of sparse Cholesky factorization on INTEL iPSC/860

NASA Technical Reports Server (NTRS)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

NASA Astrophysics Data System (ADS)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Soft switching resonant converter with duty-cycle control in DC micro-grid system

NASA Astrophysics Data System (ADS)

Lin, Bor-Ren

2018-01-01

Resonant converter has been widely used for the benefits of low switching losses and high circuit efficiency. However, the wide frequency variation is the main drawback of resonant converter. This paper studies a new modular resonant converter with duty-cycle control to overcome this problem and realise the advantages of low switching losses, no reverse recovery current loss, balance input split voltages and constant frequency operation for medium voltage direct currentgrid or system network. Series full-bridge (FB) converters are used in the studied circuit in order to reduce the voltage stresses and power rating on power semiconductors. Flying capacitor is used between two FB converters to balance input split voltages. Two circuit modules are paralleled on the secondary side to lessen the current rating of rectifier diodes and the size of magnetic components. The resonant tank is operated at inductive load circuit to help power switches to be turned on at zero voltage with wide load range. The pulse-width modulation scheme is used to regulate output voltage. Experimental verifications are provided to show the performance of the proposed circuit.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
On the relationship between parallel computation and graph embedding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gupta, A.K.

1989-01-01

The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Mcqueuer

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-09-12

Mcqueuer is a simple tool that allows anyone from researchers to experienced developers to create multi-node/multi-core jobs by simply creating a file with a list of commands. Users simply combine tasks, which would otherwise each be their own job on the cluster, into a single file that is given to Mcqueuer. Mcqueuer then does the heavy lifting required to process the tasks in parallel in a single multi-node job. In addition, Mcqueuer provides load-balancing, which frees the user from having to worry about complex memory and CPU considerations, and instead focus on the processing itself.
Dynamically allocating sets of fine-grained processors to running computations

NASA Technical Reports Server (NTRS)

Middleton, David

1988-01-01

Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined.
Adaptive Load-Balancing Algorithms Using Symmetric Broadcast Networks

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

In a distributed-computing environment, it is important to ensure that the processor workloads are adequately balanced. Among numerous load-balancing algorithms, a unique approach due to Dam and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three novel SBN-based load-balancing algorithms, and implement them on an SP2. A thorough experimental study with Poisson-distributed synthetic loads demonstrates that these algorithms are very effective in balancing system load while minimizing processor idle time. They also compare favorably with several other existing load-balancing techniques. Additional experiments performed with real data demonstrate that the SBN approach is effective in adaptive computational science and engineering applications where dynamic load balancing is extremely crucial.

A Baseline Load Schedule for the Manual Calibration of a Force Balance

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Gisler, R.

2013-01-01

A baseline load schedule for the manual calibration of a force balance is defined that takes current capabilities at the NASA Ames Balance Calibration Laboratory into account. The chosen load schedule consists of 18 load series with a total of 194 data points. It was designed to satisfy six requirements: (i) positive and negative loadings should be applied for each load component; (ii) at least three loadings should be applied between 0 % and 100 % load capacity; (iii) normal and side force loadings should be applied at the forward gage location, aft gage location, and the balance moment center; (iv) the balance should be used in "up" and "down" orientation to get positive and negative axial force loadings; (v) the constant normal and side force approaches should be used to get the rolling moment loadings; (vi) rolling moment loadings should be obtained for 0, 90, 180, and 270 degrees balance orientation. In addition, three different approaches are discussed in the paper that may be used to independently estimate the natural zeros, i.e., the gage outputs of the absolute load datum of the balance. These three approaches provide gage output differences that can be used to estimate the weight of both the metric and non-metric part of the balance. Data from the calibration of a six-component force balance will be used in the final manuscript of the paper to illustrate characteristics of the proposed baseline load schedule.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

DOE PAGES

Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

2013-01-01

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Parallelization of Nullspace Algorithm for the computation of metabolic pathways

PubMed Central

Jevremović, Dimitrije; Trinh, Cong T.; Srienc, Friedrich; Sosa, Carlos P.; Boley, Daniel

2011-01-01

Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100–1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency. PMID:22058581
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Deep Constrained Siamese Hash Coding Network and Load-Balanced Locality-Sensitive Hashing for Near Duplicate Image Detection.

PubMed

Hu, Weiming; Fan, Yabo; Xing, Junliang; Sun, Liang; Cai, Zhaoquan; Maybank, Stephen

2018-09-01

We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.
Detection and Use of Load and Gage Output Repeats of Wind Tunnel Strain-Gage Balance Data

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2017-01-01

Criteria are discussed that may be used for the detection of load and gage output repeats of wind tunnel strain-gage balance data. First, empirical thresholds are introduced that help determine if the loads or electrical outputs of a pair of balance calibration or check load data points match. A threshold of 0.01 percent of the load capacity is suggested for the identification of matching loads. Similarly, a threshold of 0.1 microV/V is recommended for the identification of matching electrical outputs. Two examples for the use of load and output repeats are discussed to illustrate benefits of the implementation of a repeat point detection algorithm in a balance data analysis software package. The first example uses the suggested load threshold to identify repeat data points that may be used to compute pure errors of the balance loads. This type of analysis may reveal hidden data quality issues that could potentially be avoided by making calibration process improvements. The second example uses the electrical output threshold for the identification of balance fouling. Data from the calibration of a six-component force balance is used to illustrate the calculation of the pure error of the balance loads.
Research on virtual network load balancing based on OpenFlow

NASA Astrophysics Data System (ADS)

Peng, Rong; Ding, Lei

2017-08-01

The Network based on OpenFlow technology separate the control module and data forwarding module. Global deployment of load balancing strategy through network view of control plane is fast and of high efficiency. This paper proposes a Weighted Round-Robin Scheduling algorithm for virtual network and a load balancing plan for server load based on OpenFlow. Load of service nodes and load balancing tasks distribution algorithm will be taken into account.
A single-stage optical load-balanced switch for data centers.

PubMed

Huang, Qirui; Yeo, Yong-Kee; Zhou, Luying

2012-10-22

Load balancing is an attractive technique to achieve maximum throughput and optimal resource utilization in large-scale switching systems. However current electronic load-balanced switches suffer from severe problems in implementation cost, power consumption and scaling. To overcome these problems, in this paper we propose a single-stage optical load-balanced switch architecture based on an arrayed waveguide grating router (AWGR) in conjunction with fast tunable lasers. By reuse of the fast tunable lasers, the switch achieves both functions of load balancing and switching through the AWGR. With this architecture, proof-of-concept experiments have been conducted to investigate the feasibility of the optical load-balanced switch and to examine its physical performance. Compared to three-stage load-balanced switches, the reported switch needs only half of optical devices such as tunable lasers and AWGRs, which can provide a cost-effective solution for future data centers.
A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

NASA Astrophysics Data System (ADS)

Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

2016-05-01

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Evolution of a minimal parallel programming model

DOE PAGES

Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

2017-04-30

Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generalitymore » and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.« less
Scalability of a Low-Cost Multi-Teraflop Linux Cluster for High-End Classical Atomistic and Quantum Mechanical Simulations

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash

2003-01-01

Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

2002-01-01

The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
An Evaluation of the HVAC Load Potential for Providing Load Balancing Service

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Ning

This paper investigates the potential of providing aggregated intra-hour load balancing services using heating, ventilating, and air-conditioning (HVAC) systems. A direct-load control algorithm is presented. A temperature-priority-list method is used to dispatch the HVAC loads optimally to maintain consumer-desired indoor temperatures and load diversity. Realistic intra-hour load balancing signals were used to evaluate the operational characteristics of the HVAC load under different outdoor temperature profiles and different indoor temperature settings. The number of HVAC units needed is also investigated. Modeling results suggest that the number of HVACs needed to provide a {+-}1-MW load balancing service 24 hours a day variesmore » significantly with baseline settings, high and low temperature settings, and the outdoor temperatures. The results demonstrate that the intra-hour load balancing service provided by HVAC loads meet the performance requirements and can become a major source of revenue for load-serving entities where the smart grid infrastructure enables direct load control over the HAVC loads.« less
Influence of Primary Gage Sensitivities on the Convergence of Balance Load Iterations

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert Manfred

2012-01-01

The connection between the convergence of wind tunnel balance load iterations and the existence of the primary gage sensitivities of a balance is discussed. First, basic elements of two load iteration equations that the iterative method uses in combination with results of a calibration data analysis for the prediction of balance loads are reviewed. Then, the connection between the primary gage sensitivities, the load format, the gage output format, and the convergence characteristics of the load iteration equation choices is investigated. A new criterion is also introduced that may be used to objectively determine if the primary gage sensitivity of a balance gage exists. Then, it is shown that both load iteration equations will converge as long as a suitable regression model is used for the analysis of the balance calibration data, the combined influence of non linear terms of the regression model is very small, and the primary gage sensitivities of all balance gages exist. The last requirement is fulfilled, e.g., if force balance calibration data is analyzed in force balance format. Finally, it is demonstrated that only one of the two load iteration equation choices, i.e., the iteration equation used by the primary load iteration method, converges if one or more primary gage sensitivities are missing. This situation may occur, e.g., if force balance calibration data is analyzed in direct read format using the original gage outputs. Data from the calibration of a six component force balance is used to illustrate the connection between the convergence of the load iteration equation choices and the existence of the primary gage sensitivities.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
A Parallel Pipelined Renderer for the Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Chiueh, Tzi-Cker; Ma, Kwan-Liu

1997-01-01

This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Visualization Co-Processing of a CFD Simulation

NASA Technical Reports Server (NTRS)

Vaziri, Arsi

1999-01-01

OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes

NASA Technical Reports Server (NTRS)

Wissink, Andrew; Allen, Edwin (Technical Monitor)

1998-01-01

Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.

Simulation and Analysis of Three-Phase Rectifiers for Aerospace Power Applications

NASA Technical Reports Server (NTRS)

Truong, Long V.; Birchenough, Arthur G.

2004-01-01

Due to the nature of planned planetary missions, fairly large advanced power systems are required for the spacecraft. These future high power spacecrafts are expected to use dynamic power conversion systems incorporating high speed alternators as three-phase AC electrical power source. One of the early design considerations in such systems is the type of rectification to be used with the AC source for DC user loads. This paper address the issues involved with two different rectification methods, namely the conventional six and twelve pulses. Two circuit configurations which involved parallel combinations of the six and twelve-pulse rectifiers were selected for the simulation. The rectifier s input and output power waveforms will be thoroughly examined through simulations. The effects of the parasitic load for power balancing and filter components for reducing the ripple voltage at the DC loads are also included in the analysis. Details of the simulation circuits, simulation results, and design examples for reducing risk from damaging of spacecraft engines will be presented and discussed.
Multi-agent grid system Agent-GRID with dynamic load balancing of cluster nodes

NASA Astrophysics Data System (ADS)

Satymbekov, M. N.; Pak, I. T.; Naizabayeva, L.; Nurzhanov, Ch. A.

2017-12-01

In this study the work presents the system designed for automated load balancing of the contributor by analysing the load of compute nodes and the subsequent migration of virtual machines from loaded nodes to less loaded ones. This system increases the performance of cluster nodes and helps in the timely processing of data. A grid system balances the work of cluster nodes the relevance of the system is the award of multi-agent balancing for the solution of such problems.
Assessment of the Uniqueness of Wind Tunnel Strain-Gage Balance Load Predictions

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2016-01-01

A new test was developed to assess the uniqueness of wind tunnel strain-gage balance load predictions that are obtained from regression models of calibration data. The test helps balance users to gain confidence in load predictions of non-traditional balance designs. It also makes it possible to better evaluate load predictions of traditional balances that are not used as originally intended. The test works for both the Iterative and Non-Iterative Methods that are used in the aerospace testing community for the prediction of balance loads. It is based on the hypothesis that the total number of independently applied balance load components must always match the total number of independently measured bridge outputs or bridge output combinations. This hypothesis is supported by a control volume analysis of the inputs and outputs of a strain-gage balance. It is concluded from the control volume analysis that the loads and bridge outputs of a balance calibration data set must separately be tested for linear independence because it cannot always be guaranteed that a linearly independent load component set will result in linearly independent bridge output measurements. Simple linear math models for the loads and bridge outputs in combination with the variance inflation factor are used to test for linear independence. A highly unique and reversible mapping between the applied load component set and the measured bridge output set is guaranteed to exist if the maximum variance inflation factor of both sets is less than the literature recommended threshold of five. Data from the calibration of a six{component force balance is used to illustrate the application of the new test to real-world data.
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques

NASA Astrophysics Data System (ADS)

Gullerud, Arne Stewart

This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
Numerical investigation of plasma edge transport and limiter heat fluxes in Wendelstein 7-X startup plasmas with EMC3-EIRENE

NASA Astrophysics Data System (ADS)

Effenberg, F.; Feng, Y.; Schmitz, O.; Frerichs, H.; Bozhenkov, S. A.; Hölbe, H.; König, R.; Krychowiak, M.; Pedersen, T. Sunn; Reiter, D.; Stephey, L.; W7-X Team

2017-03-01

The results of a first systematic assessment of plasma edge transport processes for the limiter startup configuration at Wendelstein 7-X are presented. This includes an investigation of transport from intrinsic and externally injected impurities and their impact on the power balance and limiter heat fluxes. The fully 3D coupled plasma fluid and kinetic neutral transport Monte Carlo code EMC3-EIRENE is used. The analysis of the magnetic topology shows that the poloidally and toroidally localized limiters cause a 3D helical scrape-off layer (SOL) consisting of magnetic flux tubes of three different connection lengths L C. The transport in the helical SOL is governed by L C as topological scale length for the parallel plasma loss channel to the limiters. A clear modulation of the plasma pressure with L C is seen. The helical flux tube topology results in counter streaming sonic plasma flows. The heterogeneous SOL plasma structure yields an uneven limiter heat load distribution with localized peaking. Assuming spatially constant anomalous transport coefficients, increasing plasma density yields a reduction of the maximum peak heat loads from 12 MWm-2 to 7.5 MWm-2 and a broadening of the deposited heat fluxes. The impact of impurities on the limiter heat loads is studied by assuming intrinsic carbon impurities eroded from the limiter surfaces with a gross chemical sputtering yield of 2 % . The resulting radiative losses account for less than 10% of the input power in the power balance with marginal impact on the limiter heat loads. It is shown that a significant mitigation of peak heat loads, 40-50%, can be achieved with controlled impurity seeding with nitrogen and neon, which is a method of particular interest for the later island divertor phase.
Combinatorial Algorithms to Enable Computational Science and Engineering: Work from the CSCAPES Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boman, Erik G.; Catalyurek, Umit V.; Chevalier, Cedric

2015-01-16

This final progress report summarizes the work accomplished at the Combinatorial Scientific Computing and Petascale Simulations Institute. We developed Zoltan, a parallel mesh partitioning library that made use of accurate hypergraph models to provide load balancing in mesh-based computations. We developed several graph coloring algorithms for computing Jacobian and Hessian matrices and organized them into a software package called ColPack. We developed parallel algorithms for graph coloring and graph matching problems, and also designed multi-scale graph algorithms. Three PhD students graduated, six more are continuing their PhD studies, and four postdoctoral scholars were advised. Six of these students and Fellowsmore » have joined DOE Labs (Sandia, Berkeley), as staff scientists or as postdoctoral scientists. We also organized the SIAM Workshop on Combinatorial Scientific Computing (CSC) in 2007, 2009, and 2011 to continue to foster the CSC community.« less
A Comparison of Three Programming Models for Adaptive Applications

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

2000-01-01

We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.

PubMed

Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin

2014-10-14

The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors - Performance results

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

NASA Technical Reports Server (NTRS)

Hailperin, M.

1993-01-01

This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that the authors' techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. The authors' method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.
Detection of Unexpected High Correlations between Balance Calibration Loads and Load Residuals

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2014-01-01

An algorithm was developed for the assessment of strain-gage balance calibration data that makes it possible to systematically investigate potential sources of unexpected high correlations between calibration load residuals and applied calibration loads. The algorithm investigates correlations on a load series by load series basis. The linear correlation coefficient is used to quantify the correlations. It is computed for all possible pairs of calibration load residuals and applied calibration loads that can be constructed for the given balance calibration data set. An unexpected high correlation between a load residual and a load is detected if three conditions are met: (i) the absolute value of the correlation coefficient of a residual/load pair exceeds 0.95; (ii) the maximum of the absolute values of the residuals of a load series exceeds 0.25 % of the load capacity; (iii) the load component of the load series is intentionally applied. Data from a baseline calibration of a six-component force balance is used to illustrate the application of the detection algorithm to a real-world data set. This analysis also showed that the detection algorithm can identify load alignment errors as long as repeat load series are contained in the balance calibration data set that do not suffer from load alignment problems.
Adaptive Load-Balancing Algorithms using Symmetric Broadcast Networks

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In a distributed computing environment, it is important to ensure that the processor workloads are adequately balanced, Among numerous load-balancing algorithms, a unique approach due to Das and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three efficient SBN-based dynamic load-balancing algorithms, and implement them on an SGI Origin2000. A thorough experimental study with Poisson distributed synthetic loads demonstrates that our algorithms are effective in balancing system load. By optimizing completion time and idle time, the proposed algorithms are shown to compare favorably with several existing approaches.
Analysis of the thermal balance characteristics for multiple-connected piezoelectric transformers.

PubMed

Park, Joung-Hu; Cho, Bo-Hyung; Choi, Sung-Jin; Lee, Sang-Min

2009-08-01

Because the amount of power that a piezoelectric transformer (PT) can handle is limited, multiple connections of PTs are necessary for the power-capacity improvement of PT-applications. In the connection, thermal imbalance between the PTs should be prevented to avoid the thermal runaway of each PT. The thermal balance of the multiple-connected PTs is dominantly affected by the electrothermal characteristics of individual PTs. In this paper, the thermal balance of both parallel-parallel and parallel-series connections are analyzed by electrical model parameters. For quantitative analysis, the thermal-balance effects are estimated by the simulation of the mechanical loss ratio between the PTs. The analysis results show that with PTs of similar characteristics, the parallel-series connection has better thermal balance characteristics due to the reduced mechanical loss of the higher temperature PT. For experimental verification of the analysis, a hardware-prototype test of a Cs-Lp type 40 W adapter system with radial-vibration mode PTs has been performed.
Scalable Domain Decomposed Monte Carlo Particle Transport

NASA Astrophysics Data System (ADS)

O'Brien, Matthew Joseph

In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
Numerical aspects and implementation of a two-layer zonal wall model for LES of compressible turbulent flows on unstructured meshes

NASA Astrophysics Data System (ADS)

Park, George Ilhwan; Moin, Parviz

2016-01-01

This paper focuses on numerical and practical aspects associated with a parallel implementation of a two-layer zonal wall model for large-eddy simulation (LES) of compressible wall-bounded turbulent flows on unstructured meshes. A zonal wall model based on the solution of unsteady three-dimensional Reynolds-averaged Navier-Stokes (RANS) equations on a separate near-wall grid is implemented in an unstructured, cell-centered finite-volume LES solver. The main challenge in its implementation is to couple two parallel, unstructured flow solvers for efficient boundary data communication and simultaneous time integrations. A coupling strategy with good load balancing and low processors underutilization is identified. Face mapping and interpolation procedures at the coupling interface are explained in detail. The method of manufactured solution is used for verifying the correct implementation of solver coupling, and parallel performance of the combined wall-modeled LES (WMLES) solver is investigated. The method has successfully been applied to several attached and separated flows, including a transitional flow over a flat plate and a separated flow over an airfoil at an angle of attack.
Adaptation of a Multi-Block Structured Solver for Effective Use in a Hybrid CPU/GPU Massively Parallel Environment

NASA Astrophysics Data System (ADS)

Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain

2014-11-01

Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
Predicting Flows of Rarefied Gases

NASA Technical Reports Server (NTRS)

LeBeau, Gerald J.; Wilmoth, Richard G.

2005-01-01

DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biwas, Rupak; Kwak, Dochan (Technical Monitor)

2001-01-01

NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of graphically distributed computers, databases, and human expertise, in order to solve large-scale realistic computational problems. This type of a meta-computing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme. called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a parallel distributed fashion on the IPG. We also analyze the conditions that are required for the IPG to be an effective tool for such distributed computations. Our results show that MinEX is a viable load balancer provided the nodes of the IPG are connected by a high-speed asynchronous interconnection network.
Effects of external loads on balance control during upright stance: experimental results and model-based predictions.

PubMed

Qu, Xingda; Nussbaum, Maury A

2009-01-01

The purpose of this study was to identify the effects of external loads on balance control during upright stance, and to examine the ability of a new balance control model to predict these effects. External loads were applied to 12 young, healthy participants, and effects on balance control were characterized by center-of-pressure (COP) based measures. Several loading conditions were studied, involving combinations of load mass (10% and 20% of individual body mass) and height (at or 15% of stature above the whole-body COM). A balance control model based on an optimal control strategy was used to predict COP time series. It was assumed that a given individual would adopt the same neural optimal control mechanisms, identified in a no-load condition, under diverse external loading conditions. With the application of external loads, COP mean velocity in the anterior-posterior direction and RMS distance in the medial-lateral direction increased 8.1% and 10.4%, respectively. Predicted COP mean velocity and RMS distance in the anterior-posterior direction also increased with external loading, by 11.1% and 2.9%, respectively. Both experimental COP data and model-based predictions provided the same general conclusion, that application of larger external loads and loads more superior to the whole body center of mass lead to less effective postural control and perhaps a greater risk of loss of balance or falls. Thus, it can be concluded that the assumption about consistency in control mechanisms was partially supported, and it is the mechanical changes induced by external loads that primarily affect balance control.

Load Balancing in Structured P2P Networks

NASA Astrophysics Data System (ADS)

Zhu, Yingwu

In this chapter we start by addressing the importance and necessity of load balancing in structured P2P networks, due to three main reasons. First, structured P2P networks assume uniform peer capacities while peer capacities are heterogeneous in deployed P2P networks. Second, resorting to pseudo-uniformity of the hash function used to generate node IDs and data item keys leads to imbalanced overlay address space and item distribution. Lastly, placement of data items cannot be randomized in some applications (e.g., range searching). We then present an overview of load aggregation and dissemination techniques that are required by many load balancing algorithms. Two techniques are discussed including tree structure-based approach and gossip-based approach. They make different tradeoffs between estimate/aggregate accuracy and failure resilience. To address the issue of load imbalance, three main solutions are described: virtual server-based approach, power of two choices, and address-space and item balancing. While different in their designs, they all aim to improve balance on the address space and data item distribution. As a case study, the chapter discusses a virtual server-based load balancing algorithm that strives to ensure fair load distribution among nodes and minimize load balancing cost in bandwidth. Finally, the chapter concludes with future research and a summary.
Performance Optimization of Marine Science and Numerical Modeling on HPC Cluster

PubMed Central

Yang, Dongdong; Yang, Hailong; Wang, Luming; Zhou, Yucong; Zhang, Zhiyuan; Wang, Rui; Liu, Yi

2017-01-01

Marine science and numerical modeling (MASNUM) is widely used in forecasting ocean wave movement, through simulating the variation tendency of the ocean wave. Although efforts have been devoted to improve the performance of MASNUM from various aspects by existing work, there is still large space unexplored for further performance improvement. In this paper, we aim at improving the performance of propagation solver and data access during the simulation, in addition to the efficiency of output I/O and load balance. Our optimizations include several effective techniques such as the algorithm redesign, load distribution optimization, parallel I/O and data access optimization. The experimental results demonstrate that our approach achieves higher performance compared to the state-of-the-art work, about 3.5x speedup without degrading the prediction accuracy. In addition, the parameter sensitivity analysis shows our optimizations are effective under various topography resolutions and output frequencies. PMID:28045972
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert

2013-01-01

Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Quantum load balancing in ad hoc networks

NASA Astrophysics Data System (ADS)

Hasanpour, M.; Shariat, S.; Barnaghi, P.; Hoseinitabatabaei, S. A.; Vahid, S.; Tafazolli, R.

2017-06-01

This paper presents a novel approach in targeting load balancing in ad hoc networks utilizing the properties of quantum game theory. This approach benefits from the instantaneous and information-less capability of entangled particles to synchronize the load balancing strategies in ad hoc networks. The quantum load balancing (QLB) algorithm proposed by this work is implemented on top of OLSR as the baseline routing protocol; its performance is analyzed against the baseline OLSR, and considerable gain is reported regarding some of the main QoS metrics such as delay and jitter. Furthermore, it is shown that QLB algorithm supports a solid stability gain in terms of throughput which stands a proof of concept for the load balancing properties of the proposed theory.
Efficient implementation of a 3-dimensional ADI method on the iPSC/860

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van der Wijngaart, R.F.

1993-12-31

A comparison is made between several domain decomposition strategies for the solution of three-dimensional partial differential equations on a MIMD distributed memory parallel computer. The grids used are structured, and the numerical algorithm is ADI. Important implementation issues regarding load balancing, storage requirements, network latency, and overlap of computations and communications are discussed. Results of the solution of the three-dimensional heat equation on the Intel iPSC/860 are presented for the three most viable methods. It is found that the Bruno-Cappello decomposition delivers optimal computational speed through an almost complete elimination of processor idle time, while providing good memory efficiency.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

NASA Technical Reports Server (NTRS)

Abdi, Frank

1996-01-01

A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1997-01-01

Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation

NASA Astrophysics Data System (ADS)

Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru

1999-09-01

We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
A novel load balanced energy conservation approach in WSN using biogeography based optimization

NASA Astrophysics Data System (ADS)

Kaushik, Ajay; Indu, S.; Gupta, Daya

2017-09-01

Clustering sensor nodes is an effective technique to reduce energy consumption of the sensor nodes and maximize the lifetime of Wireless sensor networks. Balancing load of the cluster head is an important factor in long run operation of WSNs. In this paper we propose a novel load balancing approach using biogeography based optimization (LB-BBO). LB-BBO uses two separate fitness functions to perform load balancing of equal and unequal load respectively. The proposed method is simulated using matlab and compared with existing methods. The proposed method shows better performance than all the previous works implemented for energy conservation in WSN
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Physical load handling and listening comprehension effects on balance control.

PubMed

Qu, Xingda

2010-12-01

The purpose of this study was to determine the physical load handling and listening comprehension effects on balance control. A total of 16 young and 16 elderly participants were recruited in this study. The physical load handling task required holding a 5-kg load in each hand with arms at sides. The listening comprehension task involved attentive listening to a short conversation. Three short questions were asked regarding the conversation right after the testing trial to test the participants' attentiveness during the experiment. Balance control was assessed by centre of pressure-based measures, which were calculated from the force platform data when the participants were quietly standing upright on a force platform. Results from this study showed that both physical load handling and listening comprehension adversely affected balance control. Physical load handling had a more deleterious effect on balance control under the listening comprehension condition vs. no-listening comprehension condition. Based on the findings from this study, interventions for the improvement of balance could be focused on avoiding exposures to physically demanding tasks and cognitively demanding tasks simultaneously. STATEMENT OF RELEVANCE: Findings from this study can aid in better understanding how humans maintain balance, especially when physical and cognitive loads are applied. Such information is useful for developing interventions to prevent fall incidents and injuries in occupational settings and daily activities.
Higher order balance control: Distinct effects between cognitive task and manual steadiness constraint on automatic postural responses.

PubMed

Coelho, Daniel Boari; Bourlinova, Catarina; Teixeira, Luis Augusto

2016-12-01

In the present experiment, we aimed to evaluate the interactive effect of performing a cognitive task simultaneously with a manual task requiring either high or low steadiness on APRs. Young volunteers performed the task of recovering upright balance following a mechanical perturbation provoked by unanticipatedly releasing a load pulling the participant's body backwards. The postural task was performed while holding a cylinder steadily on a tray. One group performed that task under high (cylinder' round side down) and another one under low (cylinder' flat side down) manual steadiness constraint. Those tasks were evaluated in the conditions of performing concurrently a cognitive numeric subtraction task and under no cognitive task. Analysis showed that performance of the cognitive task led to increased body and tray displacement, associated with higher displacement at the hip and upper trunk, and lower magnitude of activation of the GM muscle in response to the perturbation. Conversely, high manual steadiness constraint led to reduced tray velocity in association with lower values of trunk displacement, and decreased rotation amplitude at the ankle and hip joints. We found no interactions between the effects of the cognitive and manual tasks on APRs, suggesting that they were processed in parallel in the generation of responses for balance recovery. Modulation of postural responses from the manual and cognitive tasks indicates participation of higher order neural structures in the generation of APRs, with postural responses being affected by multiple mental processes occurring in parallel. Copyright © 2016 Elsevier B.V. All rights reserved.
Collectively loading an application in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

NASA Technical Reports Server (NTRS)

Hailperin, Max

1993-01-01

This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system load ing, resulting in fewer object migration than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive methods.
14 CFR 25.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2010 CFR

2010-01-01

... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2012 CFR

2012-01-01

... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2011 CFR

2011-01-01

... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2014 CFR

2014-01-01

... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2013 CFR

2013-01-01

... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
40 CFR Table 33 to Subpart G of... - Saturation Factors

Code of Federal Regulations, 2011 CFR

2011-07-01

... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...

40 CFR Table 33 to Subpart G of... - Saturation Factors

Code of Federal Regulations, 2014 CFR

2014-07-01

... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors

Code of Federal Regulations, 2012 CFR

2012-07-01

... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors

Code of Federal Regulations, 2010 CFR

2010-07-01

... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors

Code of Federal Regulations, 2013 CFR

2013-07-01

... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
Expert Systems on Multiprocessor Architectures. Volume 3. Technical Reports

DTIC Science & Technology

1991-06-01

choice of load balancing vs. load sharing 1141. While load balancing strives to keep all sites equally loaded, load sharing merely tries to prevent ...unnecessary idleness. Loo. balancing is appropriate to object- oriented real- time systems because * real-time systems ne ,l to prevent long waits for...oetavir ConClass siy51cr Iz a n ubjeU rephitation ’-enare ir order wo prevent a partic=Lar abiec:;ram heing (ntrlu ~lel Ar iic]en:f etautaan ire chanw
Novel models and algorithms of load balancing for variable-structured collaborative simulation under HLA/RTI

NASA Astrophysics Data System (ADS)

Yue, Yingchao; Fan, Wenhui; Xiao, Tianyuan; Ma, Cheng

2013-07-01

High level architecture(HLA) is the open standard in the collaborative simulation field. Scholars have been paying close attention to theoretical research on and engineering applications of collaborative simulation based on HLA/RTI, which extends HLA in various aspects like functionality and efficiency. However, related study on the load balancing problem of HLA collaborative simulation is insufficient. Without load balancing, collaborative simulation under HLA/RTI may encounter performance reduction or even fatal errors. In this paper, load balancing is further divided into static problems and dynamic problems. A multi-objective model is established and the randomness of model parameters is taken into consideration for static load balancing, which makes the model more credible. The Monte Carlo based optimization algorithm(MCOA) is excogitated to gain static load balance. For dynamic load balancing, a new type of dynamic load balancing problem is put forward with regards to the variable-structured collaborative simulation under HLA/RTI. In order to minimize the influence against the running collaborative simulation, the ordinal optimization based algorithm(OOA) is devised to shorten the optimization time. Furthermore, the two algorithms are adopted in simulation experiments of different scenarios, which demonstrate their effectiveness and efficiency. An engineering experiment about collaborative simulation under HLA/RTI of high speed electricity multiple units(EMU) is also conducted to indentify credibility of the proposed models and supportive utility of MCOA and OOA to practical engineering systems. The proposed research ensures compatibility of traditional HLA, enhances the ability for assigning simulation loads onto computing units both statically and dynamically, improves the performance of collaborative simulation system and makes full use of the hardware resources.
High performance computing in biology: multimillion atom simulations of nanoscale systems

PubMed Central

Sanbonmatsu, K. Y.; Tung, C.-S.

2007-01-01

Computational methods have been used in biology for sequence analysis (bioinformatics), all-atom simulation (molecular dynamics and quantum calculations), and more recently for modeling biological networks (systems biology). Of these three techniques, all-atom simulation is currently the most computationally demanding, in terms of compute load, communication speed, and memory load. Breakthroughs in electrostatic force calculation and dynamic load balancing have enabled molecular dynamics simulations of large biomolecular complexes. Here, we report simulation results for the ribosome, using approximately 2.64 million atoms, the largest all-atom biomolecular simulation published to date. Several other nanoscale systems with different numbers of atoms were studied to measure the performance of the NAMD molecular dynamics simulation program on the Los Alamos National Laboratory Q Machine. We demonstrate that multimillion atom systems represent a 'sweet spot' for the NAMD code on large supercomputers. NAMD displays an unprecedented 85% parallel scaling efficiency for the ribosome system on 1024 CPUs. We also review recent targeted molecular dynamics simulations of the ribosome that prove useful for studying conformational changes of this large biomolecular complex in atomic detail. PMID:17187988
Load Balancing in Hypergraphs

NASA Astrophysics Data System (ADS)

Delgosha, Payam; Anantharam, Venkat

2018-03-01

Consider a simple locally finite hypergraph on a countable vertex set, where each edge represents one unit of load which should be distributed among the vertices defining the edge. An allocation of load is called balanced if load cannot be moved from a vertex to another that is carrying less load. We analyze the properties of balanced allocations of load. We extend the concept of balancedness from finite hypergraphs to their local weak limits in the sense of Benjamini and Schramm (Electron J Probab 6(23):13, 2001) and Aldous and Steele (in: Probability on discrete structures. Springer, Berlin, pp 1-72, 2004). To do this, we define a notion of unimodularity for hypergraphs which could be considered an extension of unimodularity in graphs. We give a variational formula for the balanced load distribution and, in particular, we characterize it in the special case of unimodular hypergraph Galton-Watson processes. Moreover, we prove the convergence of the maximum load under some conditions. Our work is an extension to hypergraphs of Anantharam and Salez (Ann Appl Probab 26(1):305-327, 2016), which considered load balancing in graphs, and is aimed at more comprehensively resolving conjectures of Hajek (IEEE Trans Inf Theory 36(6):1398-1414, 1990).
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Formalization, equivalence and generalization of basic resonance electrical circuits

NASA Astrophysics Data System (ADS)

Penev, Dimitar; Arnaudov, Dimitar; Hinov, Nikolay

2017-12-01

In the work are presented basic resonance circuits, which are used in resonance energy converters. The following resonant circuits are considered: serial, serial with parallel load parallel capacitor, parallel and parallel with serial loaded inductance. For the circuits under consideration, expressions are generated for the frequencies of own oscillations and for the equivalence of the active power emitted in the load. Mathematical expressions are graphically constructed and verified using computer simulations. The results obtained are used in the model based design of resonant energy converters with DC or AC output. This guaranteed the output indicators of power electronic devices.
Knee Joint Kinetics in Relation to Commonly Prescribed Squat Loads and Depths

PubMed Central

Cotter, Joshua A.; Chaudhari, Ait M.; Jamison, Steve T.; Devor, Steven T.

2014-01-01

Controversy exists regarding the safety and performance benefits of performing the squat exercise to depths beyond 90° of knee flexion. Our aim was to compare the net peak external knee flexion moments (pEKFM) experienced over typical ranges of squat loads and depths. Sixteen recreationally trained males (n = 16; 22.7 ± 1.1 yrs; 85.4 ± 2.1 kg; 177.6 ± 0.96 cm; mean ± SEM) with no previous lower limb surgeries or other orthopedic issues and at least one year of consistent resistance training experience while utilizing the squat exercise performed single repetition squat trials in a random order at squat depths of above parallel, parallel, and below parallel. Less than one week before testing, one repetition maximum (1RM) values were found for each squat depth. Subsequent testing required subjects to perform squats at the three depths with three different loads: unloaded, 50% 1RM, and 85% 1RM (nine total trials). Force platform and kinematic data were collected to calculate pEKFM. To assess differences among loads and depths, a two-factor (load and depth) repeated-measures ANOVA with significance set at the P < 0.05 level was used. Squat 1RM significantly decreased 13.6% from the above parallel to parallel squat and another 3.6% from the parallel to the below parallel squat (P < 0.05). Net peak external knee flexion moments significantly increased as both squat depth and load were increased (P ≤ 0.02). Slopes of pEKFM were greater from unloaded to 50% 1RM than when progressing from 50% to 85% 1RM (P < 0.001). The results suggest that that typical decreases in squat loads used with increasing depths are not enough to offset increases in pEKFM. PMID:23085977
Parallel Execution of Functional Mock-up Units in Buildings Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

2016-06-30

A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less
Solvers for $$\\mathcal{O} (N)$$ Electronic Structure in the Strong Scaling Limit

DOE PAGES

Bock, Nicolas; Challacombe, William M.; Kale, Laxmikant

2016-01-26

Here we present a hybrid OpenMP/Charm\\tt++ framework for solving themore » $$\\mathcal{O} (N)$$ self-consistent-field eigenvalue problem with parallelism in the strong scaling regime, $$P\\gg{N}$$, where $P$ is the number of cores, and $N$ is a measure of system size, i.e., the number of matrix rows/columns, basis functions, atoms, molecules, etc. This result is achieved with a nested approach to spectral projection and the sparse approximate matrix multiply [Bock and Challacombe, SIAM J. Sci. Comput., 35 (2013), pp. C72--C98], and involves a recursive, task-parallel algorithm, often employed by generalized $N$-Body solvers, to occlusion and culling of negligible products in the case of matrices with decay. Lastly, employing classic technologies associated with generalized $N$-Body solvers, including overdecomposition, recursive task parallelism, orderings that preserve locality, and persistence-based load balancing, we obtain scaling beyond hundreds of cores per molecule for small water clusters ([H$${}_2$$O]$${}_N$$, $$N \\in \\{ 30, 90, 150 \\}$$, $$P/N \\approx \\{ 819, 273, 164 \\}$$) and find support for an increasingly strong scalability with increasing system size $N$.« less
SSME alternate turbopump (pump section) axial load analysis

NASA Technical Reports Server (NTRS)

Crease, G. A.; Rosello, A., Jr.; Fetfatsidis, A. K.

1989-01-01

A flow balancing computer program constructed to calculate the axial loads on the Space Shuttle Main Engine (SSME) alternate turbopumps (ATs) pump sections are described. The loads are used in turn to determine load balancing piston design requirements. The application of the program to the inlet section, inducer/impeller/stage, bearings, seals, labyrinth, damper, piston, face and corner, and stationary/rotating surfaces is indicated. Design analysis results are reported which show that the balancing piston's designs are adequate and that performance and life will not be degraded by the turbopump's axial load characteristics.
Parallel Adaptive High-Order CFD Simulations Characterizing Cavity Acoustics for the Complete SOFIA Aircraft

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2014-01-01

This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.
Efficient iteration in data-parallel programs with irregular and dynamically distributed data structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

1990-02-01

To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Mapping a battlefield simulation onto message-passing parallel architectures

NASA Technical Reports Server (NTRS)

Nicol, David M.

1987-01-01

Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

NASA Astrophysics Data System (ADS)

Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

2015-03-01

We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
An improved method for determining force balance calibration accuracy

NASA Technical Reports Server (NTRS)

Ferris, Alice T.

1993-01-01

The results of an improved statistical method used at Langley Research Center for determining and stating the accuracy of a force balance calibration are presented. The application of the method for initial loads, initial load determination, auxiliary loads, primary loads, and proof loads is described. The data analysis is briefly addressed.
Unstructured P2P Network Load Balance Strategy Based on Multilevel Partitioning of Hypergraph

NASA Astrophysics Data System (ADS)

Feng, Lv; Chunlin, Gao; Kaiyang, Ma

2017-05-01

With rapid development of computer performance and distributed technology, P2P-based resource sharing mode plays important role in Internet. P2P network users continued to increase so the high dynamic characteristics of the system determine that it is difficult to obtain the load of other nodes. Therefore, a dynamic load balance strategy based on hypergraph is proposed in this article. The scheme develops from the idea of hypergraph theory in multilevel partitioning. It adopts optimized multilevel partitioning algorithms to partition P2P network into several small areas, and assigns each area a supernode for the management and load transferring of the nodes in this area. In the case of global scheduling is difficult to be achieved, the priority of a number of small range of load balancing can be ensured first. By the node load balance in each small area the whole network can achieve relative load balance. The experiments indicate that the load distribution of network nodes in our scheme is obviously compacter. It effectively solves the unbalanced problems in P2P network, which also improve the scalability and bandwidth utilization of system.

Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arumugam, Kamesh

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
A Universal Threshold for the Assessment of Load and Output Residuals of Strain-Gage Balance Data

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2017-01-01

A new universal residual threshold for the detection of load and gage output residual outliers of wind tunnel strain{gage balance data was developed. The threshold works with both the Iterative and Non{Iterative Methods that are used in the aerospace testing community to analyze and process balance data. It also supports all known load and gage output formats that are traditionally used to describe balance data. The threshold's definition is based on an empirical electrical constant. First, the constant is used to construct a threshold for the assessment of gage output residuals. Then, the related threshold for the assessment of load residuals is obtained by multiplying the empirical electrical constant with the sum of the absolute values of all first partial derivatives of a given load component. The empirical constant equals 2.5 microV/V for the assessment of balance calibration or check load data residuals. A value of 0.5 microV/V is recommended for the evaluation of repeat point residuals because, by design, the calculation of these residuals removes errors that are associated with the regression analysis of the data itself. Data from a calibration of a six-component force balance is used to illustrate the application of the new threshold definitions to real{world balance calibration data.
Valiant load-balanced robust routing under hose model for WDM mesh networks

NASA Astrophysics Data System (ADS)

Zhang, Xiaoning; Li, Lemin; Wang, Sheng

2006-09-01

In this paper, we propose Valiant Load-Balanced robust routing scheme for WDM mesh networks under the model of polyhedral uncertainty (i.e., hose model), and the proposed routing scheme is implemented with traffic grooming approach. Our Objective is to maximize the hose model throughput. A mathematic formulation of Valiant Load-Balanced robust routing is presented and three fast heuristic algorithms are also proposed. When implementing Valiant Load-Balanced robust routing scheme to WDM mesh networks, a novel traffic-grooming algorithm called MHF (minimizing hop first) is proposed. We compare the three heuristic algorithms with the VPN tree under the hose model. Finally we demonstrate in the simulation results that MHF with Valiant Load-Balanced robust routing scheme outperforms the traditional traffic-grooming algorithm in terms of the throughput for the uniform/non-uniform traffic matrix under the hose model.
Environmental concept for engineering software on MIMD computers

NASA Technical Reports Server (NTRS)

Lopez, L. A.; Valimohamed, K.

1989-01-01

The issues related to developing an environment in which engineering systems can be implemented on MIMD machines are discussed. The problem is presented in terms of implementing the finite element method under such an environment. However, neither the concepts nor the prototype implementation environment are limited to this application. The topics discussed include: the ability to schedule and synchronize tasks efficiently; granularity of tasks; load balancing; and the use of a high level language to specify parallel constructs, manage data, and achieve portability. The objective of developing a virtual machine concept which incorporates solutions to the above issues leads to a design that can be mapped onto loosely coupled, tightly coupled, and hybrid systems.
Clinical correlates of between-limb synchronization of standing balance control and falls during inpatient stroke rehabilitation.

PubMed

Mansfield, Avril; Mochizuki, George; Inness, Elizabeth L; McIlroy, William E

2012-01-01

Stroke-related sensorimotor impairment potentially contributes to impaired balance. Balance measures that reveal underlying limb-specific control problems, such as a measure of the synchronization of both lower limbs to maintain standing balance, may be uniquely informative about poststroke balance control. This study aimed to determine the relationships between clinical measures of sensorimotor control, functional balance, and fall risk and between-limb synchronization of balance control. The authors conducted a retrospective chart review of 100 individuals with stroke admitted to inpatient rehabilitation. Force plate-based measures were obtained while standing on 2 force plates, including postural sway (root mean square of anteroposterior and mediolateral center of pressure [COP]), stance load asymmetry (percentage of body weight borne on the less-loaded limb), and between-limb synchronization (cross-correlation of the COP recordings under each foot). Clinical measures obtained were motor impairment (Chedoke-McMaster Stroke Assessment), plantar cutaneous sensation, functional balance (Berg Balance Scale), and falls experienced in rehabilitation. Synchronization was significantly related to motor impairment and prospective falls, even when controlling for other force plate-based measures of standing balance control (ie, postural sway and stance load symmetry). Between-limb COP synchronization for standing balance appears to be a uniquely important index of balance control, independent of postural sway and load symmetry during stance.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
The Data Transfer Kit: A geometric rendezvous-based tool for multiphysics data transfer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slattery, S. R.; Wilson, P. P. H.; Pawlowski, R. P.

2013-07-01

The Data Transfer Kit (DTK) is a software library designed to provide parallel data transfer services for arbitrary physics components based on the concept of geometric rendezvous. The rendezvous algorithm provides a means to geometrically correlate two geometric domains that may be arbitrarily decomposed in a parallel simulation. By repartitioning both domains such that they have the same geometric domain on each parallel process, efficient and load balanced search operations and data transfer can be performed at a desirable algorithmic time complexity with low communication overhead relative to other types of mapping algorithms. With the increased development efforts in multiphysicsmore » simulation and other multiple mesh and geometry problems, generating parallel topology maps for transferring fields and other data between geometric domains is a common operation. The algorithms used to generate parallel topology maps based on the concept of geometric rendezvous as implemented in DTK are described with an example using a conjugate heat transfer calculation and thermal coupling with a neutronics code. In addition, we provide the results of initial scaling studies performed on the Jaguar Cray XK6 system at Oak Ridge National Laboratory for a worse-case-scenario problem in terms of algorithmic complexity that shows good scaling on 0(1 x 104) cores for topology map generation and excellent scaling on 0(1 x 105) cores for the data transfer operation with meshes of O(1 x 109) elements. (authors)« less
Apparatus and method for optimal phase balancing using dynamic programming with spatial consideration

DOEpatents

Robertazzi, Thomas G.; Skiena, Steven; Wang, Kai

2017-08-08

Provided are an apparatus and method for load-balancing of a three-phase electric power distribution system having a multi-phase feeder, including obtaining topology information of the feeder identifying supply points for customer loads and feeder sections between the supply points, obtaining customer information that includes peak customer load at each of the points between each of the feeder sections, performing a phase balancing analysis, and recommending phase assignment at the customer load supply points.
A novel communication mechanism based on node potential multi-path routing

NASA Astrophysics Data System (ADS)

Bu, Youjun; Zhang, Chuanhao; Jiang, YiMing; Zhang, Zhen

2016-10-01

With the network scales rapidly and new network applications emerge frequently, bandwidth supply for today's Internet could not catch up with the rapid increasing requirements. Unfortunately, irrational using of network sources makes things worse. Actual network deploys single-next-hop optimization paths for data transmission, but such "best effort" model leads to the imbalance use of network resources and usually leads to local congestion. On the other hand Multi-path routing can use the aggregation bandwidth of multi paths efficiently and improve the robustness of network, security, load balancing and quality of service. As a result, multi-path has attracted much attention in the routing and switching research fields and many important ideas and solutions have been proposed. This paper focuses on implementing the parallel transmission of multi next-hop data, balancing the network traffic and reducing the congestion. It aimed at exploring the key technologies of the multi-path communication network, which could provide a feasible academic support for subsequent applications of multi-path communication networking. It proposed a novel multi-path algorithm based on node potential in the network. And the algorithm can fully use of the network link resource and effectively balance network link resource utilization.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment

NASA Astrophysics Data System (ADS)

Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.

2013-12-01

Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
Research on parallel load sharing principle of piezoelectric six-dimensional heavy force/torque sensor

NASA Astrophysics Data System (ADS)

Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min

2011-01-01

In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
Selecting boundary conditions in physiological strain analysis of the femur: Balanced loads, inertia relief method and follower load.

PubMed

Heyland, Mark; Trepczynski, Adam; Duda, Georg N; Zehn, Manfred; Schaser, Klaus-Dieter; Märdian, Sven

2015-12-01

Selection of boundary constraints may influence amount and distribution of loads. The purpose of this study is to analyze the potential of inertia relief and follower load to maintain the effects of musculoskeletal loads even under large deflections in patient specific finite element models of intact or fractured bone compared to empiric boundary constraints which have been shown to lead to physiological displacements and surface strains. The goal is to elucidate the use of boundary conditions in strain analyses of bones. Finite element models of the intact femur and a model of clinically relevant fracture stabilization by locking plate fixation were analyzed with normal walking loading conditions for different boundary conditions, specifically re-balanced loading, inertia relief and follower load. Peak principal cortex surface strains for different boundary conditions are consistent (maximum deviation 13.7%) except for inertia relief without force balancing (maximum deviation 108.4%). Influence of follower load on displacements increases with higher deflection in fracture model (from 3% to 7% for force balanced model). For load balanced models, follower load had only minor influence, though the effect increases strongly with higher deflection. Conventional constraints of fixed nodes in space should be carefully reconsidered because their type and position are challenging to justify and for their potential to introduce relevant non-physiological reaction forces. Inertia relief provides an alternative method which yields physiological strain results. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.
Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang

2015-01-15

This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
Computer Science Techniques Applied to Parallel Atomistic Simulation

NASA Astrophysics Data System (ADS)

Nakano, Aiichiro

1998-03-01

Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Comparison of hiking stick use on lateral stability while balancing with and without a load.

PubMed

Jacobson, B H; Caldwell, B; Kulling, F A

1997-08-01

To compare hiking stick use on lateral stability while balancing with or without a load (15-kg internal frame backpack) under conditions of no stick, 1 stick, and 2 sticks for six trials 15 volunteers ages 19 to 23 years (M = 21.7 yr.) were tested six separate times on a stability platform. During randomly ordered, 1-min. trials, the length of time (sec.) the subject maintained balance (+/-10 degrees of horizontal) and the number of deviations beyond 10 degrees were recorded simultaneously. Backpack and hiking sticks were individually adjusted for each subject. A 2 x 3 repeated factor analysis of variance indicated that subjects balanced significantly longer both with and without a load while using 2 hiking sticks than 1 or 0 sticks. Significantly fewer deviations beyond 10 degrees were found when subjects were without a load and using 1 or 2 sticks versus when they used none, and no significant difference in the number of deviations were found between 1 and 2 hiking sticks. When subjects were equipped with a load, significantly improved balance was found only between the 2 sticks and no sticks. Balance was significantly enhanced by using hiking sticks, and two sticks were more effective than one while carrying a load. An increase in maintenance of static balance may reduce the possibility of falling and injury while standing on loose alpine terrain.
Knee Kinetics during Squats of Varying Loads and Depths in Recreationally Trained Females.

PubMed

Flores, Victoria; Becker, James; Burkhardt, Eric; Cotter, Joshua

2018-03-06

The back squat exercise is typically practiced with varying squat depths and barbell loads. However, depth has been inconsistently defined, resulting in unclear safety precautions when squatting with loads. Additionally, females exhibit anatomical and kinematic differences to males which may predispose them to knee joint injuries. The purpose of this study was to characterize peak knee extensor moments (pKEMs) at three commonly practiced squat depths of above parallel, parallel, and full depth, and with three loads of 0% (unloaded), 50%, and 85% depth-specific one repetition maximum (1RM) in recreationally active females. Nineteen females (age, 25.1 ± 5.8 years; body mass, 62.5 ± 10.2 kg; height, 1.6 ± 0.10 m; mean ± SD) performed squats of randomized depth and load. Inverse dynamics were used to obtain pKEMs from three-dimensional knee kinematics. Depth and load had significant interaction effects on pKEMs (p = 0.014). Significantly greater pKEMs were observed at full depth compared to parallel depth with 50% 1RM load (p = 0.001, d = 0.615), and 85% 1RM load (p = 0.010, d = 0.714). Greater pKEMs were also observed at full depth compared to above parallel depth with 50% 1RM load (p = 0.003, d = 0.504). Results indicate effect of load on female pKEMs do not follow a progressively increasing pattern with either increasing depth or load. Therefore, when high knee loading is a concern, individuals are must carefully consider both the depth of squat being performed and the relative load they are using.
Assessment of New Load Schedules for the Machine Calibration of a Force Balance

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Gisler, R.; Kew, R.

2015-01-01

New load schedules for the machine calibration of a six-component force balance are currently being developed and evaluated at the NASA Ames Balance Calibration Laboratory. One of the proposed load schedules is discussed in the paper. It has a total of 2082 points that are distributed across 16 load series. Several criteria were applied to define the load schedule. It was decided, for example, to specify the calibration load set in force balance format as this approach greatly simplifies the definition of the lower and upper bounds of the load schedule. In addition, all loads are assumed to be applied in a calibration machine by using the one-factor-at-a-time approach. At first, all single-component loads are applied in six load series. Then, three two-component load series are applied. They consist of the load pairs (N1, N2), (S1, S2), and (RM, AF). Afterwards, four three-component load series are applied. They consist of the combinations (N1, N2, AF), (S1, S2, AF), (N1, N2, RM), and (S1, S2, RM). In the next step, one four-component load series is applied. It is the load combination (N1, N2, S1, S2). Finally, two five-component load series are applied. They are the load combination (N1, N2, S1, S2, AF) and (N1, N2, S1, S2, RM). The maximum difference between loads of two subsequent data points of the load schedule is limited to 33 % of capacity. This constraint helps avoid unwanted load "jumps" in the load schedule that can have a negative impact on the performance of a calibration machine. Only loadings of the single- and two-component load series are loaded to 100 % of capacity. This approach was selected because it keeps the total number of calibration points to a reasonable limit while still allowing for the application of some of the more complex load combinations. Data from two of NASA's force balances is used to illustrate important characteristics of the proposed 2082-point calibration load schedule.
The effect of backpack weight on the standing posture and balance of schoolgirls with adolescent idiopathic scoliosis and normal controls.

PubMed

Chow, Daniel H K; Kwok, Monica L Y; Cheng, Jack C Y; Lao, Miko L M; Holmes, Andrew D; Au-Yang, Alexander; Yao, Fiona Y D; Wong, M S

2006-10-01

Concerns have been raised regarding the effect of carrying a backpack on adolescent posture and balance, but the effect of backpack loading combined with other factors affecting balance, such as adolescent idiopathic scoliosis (AIS), has not been determined. This study examines the effects of backpack load on the posture and balance of schoolgirls with AIS and normal controls. The standing posture of 26 schoolgirls with mild AIS (mean age 13, Cobb angle 10-25 degrees ) and 20 age-matched normal schoolgirls were recorded without a backpack and while carrying a standard dual-strap backpack loaded at 7.5%, 10%, 12.5% and 15% of the subject's bodyweight (BW). Kinematics of the pelvis, trunk and head were recorded using a motion analysis system and centre of pressure (COP) data were recorded using a force platform. Reliable COP data could only be derived for 13 of the subjects with AIS. Increasing backpack load causes a significantly increased flexion of the trunk in relation to the pelvis and extension of the head in relation to the trunk, as well as increased antero-posterior range of COP motion. While backpack load appears to affect balance predominantly in the antero-posterior direction, differences between groups were more evident in the medio-lateral direction, with AIS subjects showing poor balance in this direction. Overall, carrying a backpack causes similar sagittal plane changes in posture and balance in both normal and AIS groups. Load size or subject group did not influence balance, but the additive effect of backpack carrying and AIS on postural control alters the risk of fall in this population. Therefore, load limit recommendations based on normal subjects should not be applicable to subjects with AIS.
Baby Carriage: Infants Walking with Loads

ERIC Educational Resources Information Center

Garciaguirre, Jessie S.; Adolph, Karen E.; Shrout, Patrick E.

2007-01-01

Maintaining balance is a central problem for new walkers. To examine how infants cope with the additional balance control problems induced by load carriage, 14-month-olds were loaded with 15% of their body weight in shoulder-packs. Both symmetrical and asymmetrical loads disrupted alternating gait patterns and caused less mature footfall patterns.…

77 FR 75697 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-21

... wheel loads when a rail vehicle traverses a curve. With the right combination of speed, curvature, and... wheels will be equal, i.e., balanced. The curving speed corresponding to this balanced state is referred... the outer wheel load to increase and the inner wheel load to decrease. The manifestation of this load...
The Position of the Patella and Extensor Mechanism Affects Intraoperative Compartmental Loads During Total Knee Arthroplasty: A Pilot Study Using Intraoperative Sensing to Guide Soft Tissue Balance.

PubMed

Schnaser, Erik; Lee, Yuo-yu; Boettner, Friedrich; Gonzalez Della Valle, Alejandro

2015-08-01

The achievement of a well-balanced total knee arthroplasty is necessary for long-term success. We hypothesize that the dislocation of the patella during surgery affects the distribution of loads in the medial and lateral compartments. Intraoperative load sensors were used to record medial and lateral compartment loads in 56 well-balanced TKAs. Loads were recorded in full extension, relaxed extension, at 45 and 90° of flexion at full gravity-assisted flexion, with the patella in four different positions: dislocated (everted and not), located, and located and secured with two retinacular sutures. The loads in the lateral compartment in flexion were higher with a dislocated patella than with a located patella (P<0.001). A lateralized extensor mechanism artificially increases in the lateral compartment loads in flexion during TKA surgery. Instruments that allow intraoperative soft tissue balance with the patella in a physiologic position are more likely to replicate postoperative compartment loads. II (prospective comparative study). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Parameters affecting acetate concentrations during in-situ biological hydrogen methanation.

PubMed

Agneessens, Laura Mia; Ottosen, Lars Ditlev Mørck; Andersen, Martin; Berg Olesen, Christina; Feilberg, Anders; Kofoed, Michael Vedel Wegener

2018-06-01

Surplus electricity may be supplied to anaerobic digesters as H 2 gas to upgrade the CH 4 content of biogas. Acetate accumulation has been observed following H 2 injections, but the parameters determining the degree of acetate accumulation are not well understood. The pathways involved during H 2 consumption and acetate kinetics were evaluated in continuous lab reactors and parallel batch 13 C experiments. Acetate accumulation increased during initial H 2 injections as organic loading rate increased and CO 2 levels decreased below 7%. The share of CH 4 in H 2 and 13 C mass balances increased after repeated H 2 injections, which corresponded with the increase of Methanomicrobiales observed via qPCR. The organic loading rate, the inorganic carbon level and level of methanogen adaption hence determine acetate kinetics during biomethanation of H 2 . The three identified parameters may form the base of a decision tool to assess acetate accumulation during H 2 injections to an anaerobic digester. Copyright © 2018 Elsevier Ltd. All rights reserved.
Studying effects of non-equilibrium radiative transfer via HPC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holladay, Daniel

This report presents slides on Ph.D. Research Goals; Local Thermodynamic Equilibrium (LTE) Implications; Calculating an Opacity; Opacity: Pictographic Representation; Opacity: Pictographic Representation; Opacity: Pictographic Representation; Collisional Radiative Modeling; Radiative and Collisional Excitation; Photo and Electron Impact Ionization; Autoionization; The Rate Matrix; Example: Total Photoionization rate; The Rate Coefficients; inlinlte version 1.1; inlinlte: Verification; New capabilities: Rate Matrix – Flexibility; Memory Option Comparison; Improvements over previous DCA solver; Inter- and intra-node load balancing; Load Balance – Full Picture; Load Balance – Full Picture; Load Balance – Internode; Load Balance – Scaling; Description; Performance; xRAGE Simulation; Post-process @ 2hr; Post-process @ 4hr;more » Post-process @ 8hr; Takeaways; Performance for 1 realization; Motivation for QOI; Multigroup Er; Transport and NLTE large effects (1mm, 1keV); Transport large effect, NLTE lesser (1mm, 750eV); Blastwave Diagnostici – Description & Performance; Temperature Comparison; NLTE has effect on dynamics at wall; NLTE has lesser effect in the foam; Global Takeaways; The end.« less
Validation of a robotic balance system for investigations in the control of human standing balance.

PubMed

Luu, Billy L; Huryn, Thomas P; Van der Loos, H F Machiel; Croft, Elizabeth A; Blouin, Jean-Sébastien

2011-08-01

Previous studies have shown that human body sway during standing approximates the mechanics of an inverted pendulum pivoted at the ankle joints. In this study, a robotic balance system incorporating a Stewart platform base was developed to provide a new technique to investigate the neural mechanisms involved in standing balance. The robotic system, programmed with the mechanics of an inverted pendulum, controlled the motion of the body in response to a change in applied ankle torque. The ability of the robotic system to replicate the load properties of standing was validated by comparing the load stiffness generated when subjects balanced their own body to the robot's mechanical load programmed with a low (concentrated-mass model) or high (distributed-mass model) inertia. The results show that static load stiffness was not significantly (p > 0.05) different for standing and the robotic system. Dynamic load stiffness for the robotic system increased with the frequency of sway, as predicted by the mechanics of an inverted pendulum, with the higher inertia being accurately matched to the load properties of the human body. This robotic balance system accurately replicated the physical model of standing and represents a useful tool to simulate the dynamics of a standing person. © 2011 IEEE
Parallelization of the Physical-Space Statistical Analysis System (PSAS)

NASA Technical Reports Server (NTRS)

Larson, J. W.; Guo, J.; Lyster, P. M.

1999-01-01

Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational reproducibility is well known in the parallel computing community. It is a requirement that the parallel code perform calculations in a fashion that will yield identical results on different configurations of processing elements on the same platform. In some cases this problem can be solved by sacrificing performance. Meeting this requirement and still achieving high performance is very difficult. Topics to be discussed include: current PSAS design and parallelization strategy; reproducibility issues; load balance vs. database memory demands, possible solutions to these problems.
BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.

PubMed

Nogueira, David; Tomas, Pedro; Roma, Nuno

2016-01-01

The computational demand of exact-search procedures has pressed the exploitation of parallel processing accelerators to reduce the execution time of many applications. However, this often imposes strict restrictions in terms of the problem size and implementation efforts, mainly due to their possibly distinct architectures. To circumvent this limitation, a new exact-search alignment tool (BowMapCL) based on the Burrows-Wheeler Transform and FM-Index is presented. Contrasting to other alternatives, BowMapCL is based on a unified implementation using OpenCL, allowing the exploitation of multiple and possibly different devices (e.g., NVIDIA, AMD/ATI, and Intel GPUs/APUs). Furthermore, to efficiently exploit such heterogeneous architectures, BowMapCL incorporates several techniques to promote its performance and scalability, including multiple buffering, work-queue task-distribution, and dynamic load-balancing, together with index partitioning, bit-encoding, and sampling. When compared with state-of-the-art tools, the attained results showed that BowMapCL (using a single GPU) is 2 × to 7.5 × faster than mainstream multi-threaded CPU BWT-based aligners, like Bowtie, BWA, and SOAP2; and up to 4 × faster than the best performing state-of-the-art GPU implementations (namely, SOAP3 and HPG-BWT). When multiple and completely distinct devices are considered, BowMapCL efficiently scales the offered throughput, ensuring a convenient load-balance of the involved processing in the several distinct devices.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2015-01-01

An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Can We Really "Feel" a Balanced Total Knee Arthroplasty?

PubMed

Elmallah, Randa K; Mistry, Jaydev B; Cherian, Jeffrey J; Chughtai, Morad; Bhave, Anil; Roche, Martin W; Mont, Michael A

2016-09-01

Balancing techniques in total knee arthroplasty are often based on surgeons' subjective judgment. However, newer technologies have allowed for objective measurements of soft tissue balancing. This study compared the use of sensor technology to the 30-year surgeon experience regarding (1) compartment loads, (2) soft tissue releases, and (3) component rotational alignments. Patients received either sensor-guided soft tissue balancing (n = 10) or manual gap balancing (n = 12). Wireless, intraoperative sensor tibial inserts were used to measure intracompartmental loads. The surgeon was blinded to values in the manual gap-balancing cohort. In the sensor cohort, the surgeon was unblinded, and implant trials were placed after normal releases were performed to guide further ligament releases after femoral and tibial resections, as needed. Load measurements were taken at 10°, 45°, and 90°. The sensor cohort had lower medial and lateral compartment loading at 10°, 45°, and 90°. The sensor group had lower mean differences in intercompartment loading at 10° (-5.6 vs -51.7 lbs), 45° (-9.8 vs -45.9 lbs), and 90° (-4.3 vs -27 lbs) compared to manually balanced patients. There were 10 additional soft tissue releases in the sensor cohort (2 initial ones before sensor use), compared to 2 releases in the gap-balanced cohort. In the gap-balanced cohort, tibial trays were positioned at a mean 9° external rotation, compared to a mean 1° internal rotation in the sensor-guided cohort. Sensor-balanced total knee arthroplasties provide objective feedback to perform releases and potentially improve knee balancing and rotational alignment. Future work may clarify whether these changes are beneficial for our patients. Copyright © 2016 Elsevier Inc. All rights reserved.
Migration impact on load balancing - an experience on Amoeba

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, W.; Socko, P.

1996-12-31

Load balancing has been extensive study by simulation, positive results were received in most of the researches. With the increase of the availability oftlistributed systems, a few experiments have been carried out on different systems. These experimental studies either depend on task initiation or task initiation plus task migration. In this paper, we present the results of an 0 study of load balancing using a centralizedpolicy to manage the load on a set of processors, which was carried out on an Amoeba system which consists of a set of 386s and linked by 10 Mbps Ethernet. The results on onemore » hand indicate the necessity of a load balancing facility for a distributed system. On the other hand, the results question the impact of using process migration to increase system performance under the configuration used in our experiments.« less
Distributing an executable job load file to compute nodes in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

NASA Technical Reports Server (NTRS)

Venugopal, Sesh; Naik, Vijay K.

1991-01-01

A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance.
Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Abbott, Stephen

2015-11-01

Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Comparison of two tension-band fixation materials and techniques in transverse patella fractures: a biomechanical study.

PubMed

Rabalais, R David; Burger, Evalina; Lu, Yun; Mansour, Alfred; Baratta, Richard V

2008-02-01

This study compared the biomechanical properties of 2 tension-band techniques with stainless steel wire and ultra high molecular weight polyethylene (UHMWPE) cable in a patella fracture model. Transverse patella fractures were simulated in 8 cadaver knees and fixated with figure-of-8 and parallel wire configurations in combination with Kirschner wires. Identical configurations were tested with UHMWPE cable. Specimens were mounted to a testing apparatus and the quadriceps was used to extend the knees from 90 degrees to 0 degrees; 4 knees were tested under monotonic loading, and 4 knees were tested under cyclic loading. Under monotonic loading, average fracture gap was 0.50 and 0.57 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.16 and 0.04 mm, respectively, in the parallel wire construct. Under cyclic loading, average fracture gap was 1.45 and 1.66 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.45 and 0.60 mm, respectively, in the parallel wire construct. A statistically significant effect of technique was found, with the parallel wire construct performing better than the figure-of-8 construct in both loading models. There was no effect of material or interaction. In this biomechanical model, parallel wires performed better than the figure-of-8 configuration in both loading regimens, and UHMWPE cable performed similarly to 18-gauge steel wire.
A New Load Residual Threshold Definition for the Evaluation of Wind Tunnel Strain-Gage Balance Data

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2016-01-01

A new definition of a threshold for the detection of load residual outliers of wind tunnel strain-gage balance data was developed. The new threshold is defined as the product between the inverse of the absolute value of the primary gage sensitivity and an empirical limit of the electrical outputs of a strain{gage. The empirical limit of the outputs is either 2.5 microV/V for balance calibration or check load residuals. A reduced limit of 0.5 microV/V is recommended for the evaluation of differences between repeat load points because, by design, the calculation of these differences removes errors in the residuals that are associated with the regression analysis of the data itself. The definition of the new threshold and different methods for the determination of the primary gage sensitivity are discussed. In addition, calibration data of a six-component force balance and a five-component semi-span balance are used to illustrate the application of the proposed new threshold definition to different types of strain{gage balances. During the discussion of the force balance example it is also explained how the estimated maximum expected output of a balance gage can be used to better understand results of the application of the new threshold definition.
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

NASA Astrophysics Data System (ADS)

Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.

2013-12-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
Modeling high-temperature superconductors and metallic alloys on the Intel IPSC/860

NASA Astrophysics Data System (ADS)

Geist, G. A.; Peyton, B. W.; Shelton, W. A.; Stocks, G. M.

Oak Ridge National Laboratory has embarked on several computational Grand Challenges, which require the close cooperation of physicists, mathematicians, and computer scientists. One of these projects is the determination of the material properties of alloys from first principles and, in particular, the electronic structure of high-temperature superconductors. While the present focus of the project is on superconductivity, the approach is general enough to permit study of other properties of metallic alloys such as strength and magnetic properties. This paper describes the progress to date on this project. We include a description of a self-consistent KKR-CPA method, parallelization of the model, and the incorporation of a dynamic load balancing scheme into the algorithm. We also describe the development and performance of a consolidated KKR-CPA code capable of running on CRAYs, workstations, and several parallel computers without source code modification. Performance of this code on the Intel iPSC/860 is also compared to a CRAY 2, CRAY YMP, and several workstations. Finally, some density of state calculations of two perovskite superconductors are given.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

NASA Astrophysics Data System (ADS)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten

2017-11-01

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
High-resolution multi-code implementation of unsteady Navier-Stokes flow solver based on paralleled overset adaptive mesh refinement and high-order low-dissipation hybrid schemes

NASA Astrophysics Data System (ADS)

Li, Gaohua; Fu, Xiang; Wang, Fuxin

2017-10-01

The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.

Parallel Computation of Unsteady Flows on a Network of Workstations

NASA Technical Reports Server (NTRS)

1997-01-01

Parallel computation of unsteady flows requires significant computational resources. The utilization of a network of workstations seems an efficient solution to the problem where large problems can be treated at a reasonable cost. This approach requires the solution of several problems: 1) the partitioning and distribution of the problem over a network of workstation, 2) efficient communication tools, 3) managing the system efficiently for a given problem. Of course, there is the question of the efficiency of any given numerical algorithm to such a computing system. NPARC code was chosen as a sample for the application. For the explicit version of the NPARC code both two- and three-dimensional problems were studied. Again both steady and unsteady problems were investigated. The issues studied as a part of the research program were: 1) how to distribute the data between the workstations, 2) how to compute and how to communicate at each node efficiently, 3) how to balance the load distribution. In the following, a summary of these activities is presented. Details of the work have been presented and published as referenced.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2016-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2015-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Using Multithreading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Bailey, David H. (Technical Monitor)

1998-01-01

In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes. The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the question phase of FE applications on triangular meshes, and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments on EARTH-SP2, an implementation of EARTH on the IBM SP2, with different load balancing strategies that are built into the runtime system.
Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1998-01-01

In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the adaption phase of FE applications oil triangular meshes and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments oil EARTH-SP2, on implementation of EARTH on the IBM SP2 with different load balancing strategies that are built into the runtime system.
Multimode power processor

DOEpatents

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.
Multimode power processor

DOEpatents

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Analysis of temperature changes on three-phase synchronous generator using infrared: comparison between balanced and unbalanced load

NASA Astrophysics Data System (ADS)

Amien, S.; Yoga, W.; Fahmi, F.

2018-02-01

Synchronous generators are a major tool in an electrical energy generating systems, the load supplied by the generator is unbalanced. This paper discusses the effect of synchronous generator temperature on the condition of balanced load and unbalanced load, which will then be compared with the measurement result of both states of the generator. Unbalanced loads can be caused by various asymmetric disturbances in the power system and the failure of load forecasting studies so that the load distribution in each phase is not the same and causing the excessive heat of the generator. The method used in data collection was by using an infrared thermometer and resistance calculation method. The temperature comparison result between the resistive, inductive and capacitive loads in the highest temperature balance occured when the generator is loaded with a resistive load, where T = 31.9 ° C and t = 65 minutes. While in a state of unbalanced load the highest temperature occured when the generator is loaded with a capacitive load, where T = 40.1 ° C and t = 60 minutes. By understanding this behavior, we can maintain the generator for longer operation life.
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.

1995-01-01

This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.
A microscale three-dimensional urban energy balance model for studying surface temperatures

NASA Astrophysics Data System (ADS)

Krayenhoff, E. Scott; Voogt, James A.

2007-06-01

A microscale three-dimensional (3-D) urban energy balance model, Temperatures of Urban Facets in 3-D (TUF-3D), is developed to predict urban surface temperatures for a variety of surface geometries and properties, weather conditions, and solar angles. The surface is composed of plane-parallel facets: roofs, walls, and streets, which are further sub-divided into identical square patches, resulting in a 3-D raster-type model geometry. The model code is structured into radiation, conduction and convection sub-models. The radiation sub-model uses the radiosity approach and accounts for multiple reflections and shading of direct solar radiation. Conduction is solved by finite differencing of the heat conduction equation, and convection is modelled by empirically relating patch heat transfer coefficients to the momentum forcing and the building morphology. The radiation and conduction sub-models are tested individually against measurements, and the complete model is tested against full-scale urban surface temperature and energy balance observations. Modelled surface temperatures perform well at both the facet-average and the sub-facet scales given the precision of the observations and the uncertainties in the model inputs. The model has several potential applications, such as the calculation of radiative loads, and the investigation of effective thermal anisotropy (when combined with a sensor-view model).
Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.

PubMed

Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter

2015-01-20

While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
On delay adjustment for dynamic load balancing in distributed virtual environments.

PubMed

Deng, Yunhua; Lau, Rynson W H

2012-04-01

Distributed virtual environments (DVEs) are becoming very popular in recent years, due to the rapid growing of applications, such as massive multiplayer online games (MMOGs). As the number of concurrent users increases, scalability becomes one of the major challenges in designing an interactive DVE system. One solution to address this scalability problem is to adopt a multi-server architecture. While some methods focus on the quality of partitioning the load among the servers, others focus on the efficiency of the partitioning process itself. However, all these methods neglect the effect of network delay among the servers on the accuracy of the load balancing solutions. As we show in this paper, the change in the load of the servers due to network delay would affect the performance of the load balancing algorithm. In this work, we conduct a formal analysis of this problem and discuss two efficient delay adjustment schemes to address the problem. Our experimental results show that our proposed schemes can significantly improve the performance of the load balancing algorithm with neglectable computation overhead.
Model of load balancing using reliable algorithm with multi-agent system

NASA Astrophysics Data System (ADS)

Afriansyah, M. F.; Somantri, M.; Riyadi, M. A.

2017-04-01

Massive technology development is linear with the growth of internet users which increase network traffic activity. It also increases load of the system. The usage of reliable algorithm and mobile agent in distributed load balancing is a viable solution to handle the load issue on a large-scale system. Mobile agent works to collect resource information and can migrate according to given task. We propose reliable load balancing algorithm using least time first byte (LFB) combined with information from the mobile agent. In system overview, the methodology consisted of defining identification system, specification requirements, network topology and design system infrastructure. The simulation method for simulated system was using 1800 request for 10 s from the user to the server and taking the data for analysis. Software simulation was based on Apache Jmeter by observing response time and reliability of each server and then compared it with existing method. Results of performed simulation show that the LFB method with mobile agent can perform load balancing with efficient systems to all backend server without bottleneck, low risk of server overload, and reliable.
On the predictability of protein database search complexity and its relevance to optimization of distributed searches.

PubMed

Deciu, Cosmin; Sun, Jun; Wall, Mark A

2007-09-01

We discuss several aspects related to load balancing of database search jobs in a distributed computing environment, such as Linux cluster. Load balancing is a technique for making the most of multiple computational resources, which is particularly relevant in environments in which the usage of such resources is very high. The particular case of the Sequest program is considered here, but the general methodology should apply to any similar database search program. We show how the runtimes for Sequest searches of tandem mass spectral data can be predicted from profiles of previous representative searches, and how this information can be used for better load balancing of novel data. A well-known heuristic load balancing method is shown to be applicable to this problem, and its performance is analyzed for a variety of search parameters.
AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven

Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less
Vestibular control of standing balance is enhanced with increased cognitive load.

PubMed

McGeehan, Michael A; Woollacott, Marjorie H; Dalton, Brian H

2017-04-01

When cognitive load is elevated during a motor task, cortical inhibition and reaction time are increased; yet, standing balance control is often unchanged. This disconnect is likely explained by compensatory mechanisms within the balance system such as increased sensitivity of the vestibulomotor pathway. This study aimed to determine the effects of increased cognitive load on the vestibular control of standing balance. Participants stood blindfolded on a force plate with their head facing left and arms relaxed at their sides for two trials while exposed to continuous electrical vestibular stimulation (EVS). Participants either stood quietly or executed a cognitive task (double-digit arithmetic). Surface electromyography (EMG) and anterior-posterior ground-body forces (APF) were measured in order to evaluate vestibular-evoked balance responses in the frequency (coherence and gain) and time (cumulant density) domains. Total distance traveled for anterior-posterior center of pressure (COP) was assessed as a metric of balance variability. Despite similar distances traveled for COP, EVS-medial gastrocnemius (MG) EMG and EVS-APF coherence and EVS-TA EMG and EVS-MG EMG gain were elevated for multiple frequencies when standing with increased cognitive load. For the time domain, medium-latency peak amplitudes increased by 13-54% for EVS-APF and EVS-EMG relationships with the cognitive task compared to without. Peak short-latency amplitudes were unchanged. These results indicate that reliance on vestibular control of balance is enhanced when cognitive load is elevated. This augmented neural strategy may act to supplement divided cortical processing resources within the balance system and compensate for the acute neuromuscular modifications associated with increased cognitive demand.
Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

NASA Astrophysics Data System (ADS)

Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

2016-12-01

New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective function evaluation varies unpredictably, so efficiency is improved with asynchronous parallel calculations to improve load balancing. The third application (done at NCSS) incorporates new global surrogate multi-objective parallel search algorithms into pySOT and applies it to a large watershed calibration problem.
14 CFR 23.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2011 CFR

2011-01-01

...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2013 CFR

2013-01-01

...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2014 CFR

2014-01-01

...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...

14 CFR 23.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2012 CFR

2012-01-01

...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.

Code of Federal Regulations, 2010 CFR

2010-01-01

...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
Application of Temperature Sensitivities During Iterative Strain-Gage Balance Calibration Analysis

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2011-01-01

A new method is discussed that may be used to correct wind tunnel strain-gage balance load predictions for the influence of residual temperature effects at the location of the strain-gages. The method was designed for the iterative analysis technique that is used in the aerospace testing community to predict balance loads from strain-gage outputs during a wind tunnel test. The new method implicitly applies temperature corrections to the gage outputs during the load iteration process. Therefore, it can use uncorrected gage outputs directly as input for the load calculations. The new method is applied in several steps. First, balance calibration data is analyzed in the usual manner assuming that the balance temperature was kept constant during the calibration. Then, the temperature difference relative to the calibration temperature is introduced as a new independent variable for each strain--gage output. Therefore, sensors must exist near the strain--gages so that the required temperature differences can be measured during the wind tunnel test. In addition, the format of the regression coefficient matrix needs to be extended so that it can support the new independent variables. In the next step, the extended regression coefficient matrix of the original calibration data is modified by using the manufacturer specified temperature sensitivity of each strain--gage as the regression coefficient of the corresponding temperature difference variable. Finally, the modified regression coefficient matrix is converted to a data reduction matrix that the iterative analysis technique needs for the calculation of balance loads. Original calibration data and modified check load data of NASA's MC60D balance are used to illustrate the new method.
Prediction Interval Development for Wind-Tunnel Balance Check-Loading

NASA Technical Reports Server (NTRS)

Landman, Drew; Toro, Kenneth G.; Commo, Sean A.; Lynn, Keith C.

2014-01-01

Results from the Facility Analysis Verification and Operational Reliability project revealed a critical gap in capability in ground-based aeronautics research applications. Without a standardized process for check-loading the wind-tunnel balance or the model system, the quality of the aerodynamic force data collected varied significantly between facilities. A prediction interval is required in order to confirm a check-loading. The prediction interval provides an expected upper and lower bound on balance load prediction at a given confidence level. A method has been developed which accounts for sources of variability due to calibration and check-load application. The prediction interval method of calculation and a case study demonstrating its use is provided. Validation of the methods is demonstrated for the case study based on the probability of capture of confirmation points.
LBMR: Load-Balanced Multipath Routing for Wireless Data-Intensive Transmission in Real-Time Medical Monitoring.

PubMed

Tseng, Chinyang Henry

2016-05-31

In wireless networks, low-power Zigbee is an excellent network solution for wireless medical monitoring systems. Medical monitoring generally involves transmission of a large amount of data and easily causes bottleneck problems. Although Zigbee's AODV mesh routing provides extensible multi-hop data transmission to extend network coverage, it originally does not, and needs to support some form of load balancing mechanism to avoid bottlenecks. To guarantee a more reliable multi-hop data transmission for life-critical medical applications, we have developed a multipath solution, called Load-Balanced Multipath Routing (LBMR) to replace Zigbee's routing mechanism. LBMR consists of three main parts: Layer Routing Construction (LRC), a Load Estimation Algorithm (LEA), and a Route Maintenance (RM) mechanism. LRC assigns nodes into different layers based on the node's distance to the medical data gateway. Nodes can have multiple next-hops delivering medical data toward the gateway. All neighboring layer-nodes exchange flow information containing current load, which is the used by the LEA to estimate future load of next-hops to the gateway. With LBMR, nodes can choose the neighbors with the least load as the next-hops and thus can achieve load balancing and avoid bottlenecks. Furthermore, RM can detect route failures in real-time and perform route redirection to ensure routing robustness. Since LRC and LEA prevent bottlenecks while RM ensures routing fault tolerance, LBMR provides a highly reliable routing service for medical monitoring. To evaluate these accomplishments, we compare LBMR with Zigbee's AODV and another multipath protocol, AOMDV. The simulation results demonstrate LBMR achieves better load balancing, less unreachable nodes, and better packet delivery ratio than either AODV or AOMDV.
LBMR: Load-Balanced Multipath Routing for Wireless Data-Intensive Transmission in Real-Time Medical Monitoring

PubMed Central

Tseng, Chinyang Henry

2016-01-01

In wireless networks, low-power Zigbee is an excellent network solution for wireless medical monitoring systems. Medical monitoring generally involves transmission of a large amount of data and easily causes bottleneck problems. Although Zigbee’s AODV mesh routing provides extensible multi-hop data transmission to extend network coverage, it originally does not, and needs to support some form of load balancing mechanism to avoid bottlenecks. To guarantee a more reliable multi-hop data transmission for life-critical medical applications, we have developed a multipath solution, called Load-Balanced Multipath Routing (LBMR) to replace Zigbee’s routing mechanism. LBMR consists of three main parts: Layer Routing Construction (LRC), a Load Estimation Algorithm (LEA), and a Route Maintenance (RM) mechanism. LRC assigns nodes into different layers based on the node’s distance to the medical data gateway. Nodes can have multiple next-hops delivering medical data toward the gateway. All neighboring layer-nodes exchange flow information containing current load, which is the used by the LEA to estimate future load of next-hops to the gateway. With LBMR, nodes can choose the neighbors with the least load as the next-hops and thus can achieve load balancing and avoid bottlenecks. Furthermore, RM can detect route failures in real-time and perform route redirection to ensure routing robustness. Since LRC and LEA prevent bottlenecks while RM ensures routing fault tolerance, LBMR provides a highly reliable routing service for medical monitoring. To evaluate these accomplishments, we compare LBMR with Zigbee’s AODV and another multipath protocol, AOMDV. The simulation results demonstrate LBMR achieves better load balancing, less unreachable nodes, and better packet delivery ratio than either AODV or AOMDV. PMID:27258297
In-Situ Load System for Calibrating and Validating Aerodynamic Properties of Scaled Aircraft in Ground-Based Aerospace Testing Applications

NASA Technical Reports Server (NTRS)

Lynn, Keith C. (Inventor); Acheson, Michael J. (Inventor); Commo, Sean A. (Inventor); Landman, Drew (Inventor)

2016-01-01

An In-Situ Load System for calibrating and validating aerodynamic properties of scaled aircraft in ground-based aerospace testing applications includes an assembly having upper and lower components that are pivotably interconnected. A test weight can be connected to the lower component to apply a known force to a force balance. The orientation of the force balance can be varied, and the measured forces from the force balance can be compared to applied loads at various orientations to thereby develop calibration factors.
Wind Tunnel Strain-Gage Balance Calibration Data Analysis Using a Weighted Least Squares Approach

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2017-01-01

A new approach is presented that uses a weighted least squares fit to analyze wind tunnel strain-gage balance calibration data. The weighted least squares fit is specifically designed to increase the influence of single-component loadings during the regression analysis. The weighted least squares fit also reduces the impact of calibration load schedule asymmetries on the predicted primary sensitivities of the balance gages. A weighting factor between zero and one is assigned to each calibration data point that depends on a simple count of its intentionally loaded load components or gages. The greater the number of a data point's intentionally loaded load components or gages is, the smaller its weighting factor becomes. The proposed approach is applicable to both the Iterative and Non-Iterative Methods that are used for the analysis of strain-gage balance calibration data in the aerospace testing community. The Iterative Method uses a reasonable estimate of the tare corrected load set as input for the determination of the weighting factors. The Non-Iterative Method, on the other hand, uses gage output differences relative to the natural zeros as input for the determination of the weighting factors. Machine calibration data of a six-component force balance is used to illustrate benefits of the proposed weighted least squares fit. In addition, a detailed derivation of the PRESS residuals associated with a weighted least squares fit is given in the appendices of the paper as this information could not be found in the literature. These PRESS residuals may be needed to evaluate the predictive capabilities of the final regression models that result from a weighted least squares fit of the balance calibration data.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Bader, Jon B.

2010-01-01

Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Effect of armor and carrying load on body balance and leg muscle function.

PubMed

Park, Huiju; Branson, Donna; Kim, Seonyoung; Warren, Aric; Jacobson, Bert; Petrova, Adriana; Peksoz, Semra; Kamenidis, Panagiotis

2014-01-01

This study investigated the impact of weight and weight distribution of body armor and load carriage on static body balance and leg muscle function. A series of human performance tests were conducted with seven male, healthy, right-handed military students in seven garment conditions with varying weight and weight distributions. Static body balance was assessed by analyzing the trajectory of center of plantar pressure and symmetry of weight bearing in the feet. Leg muscle functions were assessed by analyzing the peak electromyography amplitude of four selected leg muscles during walking. Results of this study showed that uneven weight distribution of garment and load beyond an additional 9 kg impaired static body balance as evidenced by increased sway of center of plantar pressure and asymmetry of weight bearing in the feet. Added weight on non-dominant side of the body created greater impediment to static balance. Increased garment weight also elevated peak EMG amplitude in the rectus femoris to maintain body balance and in the medial gastrocnemius to increase propulsive force. Negative impacts on balance and leg muscle function with increased carrying loads, particularly with an uneven weight distribution, should be stressed to soldiers, designers, and sports enthusiasts. Copyright © 2013 Elsevier B.V. All rights reserved.
RNA SEQ Analysis Indicates that the AE3 Cl-/HCO3- Exchanger Contributes to Active Transport-Mediated CO2 Disposal in Heart.

PubMed

Vairamani, Kanimozhi; Wang, Hong-Sheng; Medvedovic, Mario; Lorenz, John N; Shull, Gary E

2017-08-04

Loss of the AE3 Cl - /HCO 3 - exchanger (Slc4a3) in mice causes an impaired cardiac force-frequency response and heart failure under some conditions but the mechanisms are not known. To better understand the functions of AE3, we performed RNA Seq analysis of AE3-null and wild-type mouse hearts and evaluated the data with respect to three hypotheses (CO 2 disposal, facilitation of Na + -loading, and recovery from an alkaline load) that have been proposed for its physiological functions. Gene Ontology and PubMatrix analyses of differentially expressed genes revealed a hypoxia response and changes in vasodilation and angiogenesis genes that strongly support the CO 2 disposal hypothesis. Differential expression of energy metabolism genes, which indicated increased glucose utilization and decreased fatty acid utilization, were consistent with adaptive responses to perturbations of O 2 /CO 2 balance in AE3-null myocytes. Given that the myocardium is an obligate aerobic tissue and consumes large amounts of O 2 , the data suggest that loss of AE3, which has the potential to extrude CO 2 in the form of HCO 3 - , impairs O 2 /CO 2 balance in cardiac myocytes. These results support a model in which the AE3 Cl - /HCO 3 - exchanger, coupled with parallel Cl - and H + -extrusion mechanisms and extracellular carbonic anhydrase, is responsible for active transport-mediated disposal of CO 2 .
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study

DOE Office of Scientific and Technical Information (OSTI.GOV)

E, J. C.; Huang, J. Y.; Bie, B. X.

Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study

DOE PAGES

E, J. C.; Huang, J. Y.; Bie, B. X.; ...

2016-08-02

Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Research on a Method of Geographical Information Service Load Balancing

NASA Astrophysics Data System (ADS)

Li, Heyuan; Li, Yongxing; Xue, Zhiyong; Feng, Tao

2018-05-01

With the development of geographical information service technologies, how to achieve the intelligent scheduling and high concurrent access of geographical information service resources based on load balancing is a focal point of current study. This paper presents an algorithm of dynamic load balancing. In the algorithm, types of geographical information service are matched with the corresponding server group, then the RED algorithm is combined with the method of double threshold effectively to judge the load state of serve node, finally the service is scheduled based on weighted probabilistic in a certain period. At the last, an experiment system is built based on cluster server, which proves the effectiveness of the method presented in this paper.
Automatic force balance calibration system

NASA Technical Reports Server (NTRS)

Ferris, Alice T. (Inventor)

1995-01-01

A system for automatically calibrating force balances is provided. The invention uses a reference balance aligned with the balance being calibrated to provide superior accuracy while minimizing the time required to complete the calibration. The reference balance and the test balance are rigidly attached together with closely aligned moment centers. Loads placed on the system equally effect each balance, and the differences in the readings of the two balances can be used to generate the calibration matrix for the test balance. Since the accuracy of the test calibration is determined by the accuracy of the reference balance and current technology allows for reference balances to be calibrated to within +/-0.05% the entire system has an accuracy of +/-0.2%. The entire apparatus is relatively small and can be mounted on a movable base for easy transport between test locations. The system can also accept a wide variety of reference balances, thus allowing calibration under diverse load and size requirements.
Automatic force balance calibration system

NASA Technical Reports Server (NTRS)

Ferris, Alice T. (Inventor)

1996-01-01

A system for automatically calibrating force balances is provided. The invention uses a reference balance aligned with the balance being calibrated to provide superior accuracy while minimizing the time required to complete the calibration. The reference balance and the test balance are rigidly attached together with closely aligned moment centers. Loads placed on the system equally effect each balance, and the differences in the readings of the two balances can be used to generate the calibration matrix for the test balance. Since the accuracy of the test calibration is determined by the accuracy of the reference balance and current technology allows for reference balances to be calibrated to within .+-.0.05%, the entire system has an accuracy of a .+-.0.2%. The entire apparatus is relatively small and can be mounted on a movable base for easy transport between test locations. The system can also accept a wide variety of reference balances, thus allowing calibration under diverse load and size requirements.
Roads & loads : finding a balance.

DOT National Transportation Integrated Search

2006-01-01

Striking a balance between the needs of commerce to carry heavy loads on roads and the need to preserve the significant investment in our transportation infrastructure is a challenging process. There are compelling needs from both sides of the pictur...
Large-Scale Distributed Computational Fluid Dynamics on the Information Power Grid Using Globus

NASA Technical Reports Server (NTRS)

Barnard, Stephen; Biswas, Rupak; Saini, Subhash; VanderWijngaart, Robertus; Yarrow, Maurice; Zechtzer, Lou; Foster, Ian; Larsson, Olle

1999-01-01

This paper describes an experiment in which a large-scale scientific application development for tightly-coupled parallel machines is adapted to the distributed execution environment of the Information Power Grid (IPG). A brief overview of the IPG and a description of the computational fluid dynamics (CFD) algorithm are given. The Globus metacomputing toolkit is used as the enabling device for the geographically-distributed computation. Modifications related to latency hiding and Load balancing were required for an efficient implementation of the CFD application in the IPG environment. Performance results on a pair of SGI Origin 2000 machines indicate that real scientific applications can be effectively implemented on the IPG; however, a significant amount of continued effort is required to make such an environment useful and accessible to scientists and engineers.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generalitymore » and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.« less
LAURA Users Manual: 5.3-48528

NASA Technical Reports Server (NTRS)

Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Chirstopher O.; Kleb, Bil

2010-01-01

This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem-dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the FUN3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.

LAURA Users Manual: 5.5-64987

NASA Technical Reports Server (NTRS)

Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, William L.

2013-01-01

This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintain ability by eliminating the requirement for problem dependent recompilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the Fun3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
LAURA Users Manual: 5.4-54166

NASA Technical Reports Server (NTRS)

Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, Bil

2011-01-01

This users manual provides in-depth information concerning installation and execution of Laura, version 5. Laura is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 Laura code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multi-physics coupling. As a result, Laura now shares gas-physics modules, MPI modules, and other low-level modules with the Fun3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
LAURA Users Manual: 5.2-43231

NASA Technical Reports Server (NTRS)

Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, Bil

2009-01-01

This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem-dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multiphysics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the FUN3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
Self-Avoiding Walks over Adaptive Triangular Grids

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1998-01-01

In this paper, we present a new approach to constructing a "self-avoiding" walk through a triangular mesh. Unlike the popular approach of visiting mesh elements using space-filling curves which is based on a geometric embedding, our approach is combinatorial in the sense that it uses the mesh connectivity only. We present an algorithm for constructing a self-avoiding walk which can be applied to any unstructured triangular mesh. The complexity of the algorithm is O(n x log(n)), where n is the number of triangles in the mesh. We show that for hierarchical adaptive meshes, the algorithm can be easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the run-time partitioning and load balancing of adaptive unstructured grids.
Laura Users Manual: 5.1-41601

NASA Technical Reports Server (NTRS)

Mazaheri, Alireza; Gnoffo, Peter A.; Johnston, Christopher O.; Kleb, Bil

2009-01-01

This users manual provides in-depth information concerning installation and execution of LAURA, version 5. LAURA is a structured, multi-block, computational aerothermodynamic simulation code. Version 5 represents a major refactoring of the original Fortran 77 LAURA code toward a modular structure afforded by Fortran 95. The refactoring improved usability and maintainability by eliminating the requirement for problem-dependent re-compilations, providing more intuitive distribution of functionality, and simplifying interfaces required for multiphysics coupling. As a result, LAURA now shares gas-physics modules, MPI modules, and other low-level modules with the FUN3D unstructured-grid code. In addition to internal refactoring, several new features and capabilities have been added, e.g., a GNU-standard installation process, parallel load balancing, automatic trajectory point sequencing, free-energy minimization, and coupled ablation and flowfield radiation.
Turbine exhaust diffuser with region of reduced flow area and outer boundary gas flow

DOEpatents

Orosa, John

2014-03-11

An exhaust diffuser system and method for a turbine engine. The outer boundary may include a region in which the outer boundary extends radially inwardly toward the hub structure and may direct at least a portion of an exhaust flow in the diffuser toward the hub structure. At least one gas jet is provided including a jet exit located on the outer boundary. The jet exit may discharge a flow of gas downstream substantially parallel to an inner surface of the outer boundary to direct a portion of the exhaust flow in the diffuser toward the outer boundary to effect a radially outward flow of at least a portion of the exhaust gas flow toward the outer boundary to balance an aerodynamic load between the outer and inner boundaries.
Shake Test Results and Dynamic Calibration Efforts for the Large Rotor Test Apparatus

NASA Technical Reports Server (NTRS)

Russell, Carl R.

2014-01-01

A shake test of the Large Rotor Test Apparatus (LRTA) was performed in an effort to enhance NASAscapability to measure dynamic hub loads for full-scale rotor tests. This paper documents the results of theshake test as well as efforts to calibrate the LRTA balance system to measure dynamic loads.Dynamic rotor loads are the primary source of vibration in helicopters and other rotorcraft, leading topassenger discomfort and damage due to fatigue of aircraft components. There are novel methods beingdeveloped to reduce rotor vibrations, but measuring the actual vibration reductions on full-scale rotorsremains a challenge. In order to measure rotor forces on the LRTA, a balance system in the non-rotatingframe is used. The forces at the balance can then be translated to the hub reference frame to measure therotor loads. Because the LRTA has its own dynamic response, the balance system must be calibrated toinclude the natural frequencies of the test rig.
Dynamic balance abilities of collegiate men for the bench press.

PubMed

Piper, Timothy J; Radlo, Steven J; Smith, Thomas J; Woodward, Ryan W

2012-12-01

This study investigated the dynamic balance detection ability of college men for the bench press exercise. Thirty-five college men (mean ± SD: age = 22.4 ± 2.76 years, bench press experience = 8.3 ± 2.79 years, and estimated 1RM = 120.1 ± 21.8 kg) completed 1 repetition of the bench press repetitions for each of 3 bar loading arrangements. In a randomized fashion, subjects performed the bench press with a 20-kg barbell loaded with one of the following: a balanced load, one 20-kg plate on each side; an imbalanced asymmetrical load, one 20-kg plate on one side and a 20-kg plate plus a 1.25-kg plate on the other side; or an imbalanced asymmetrical center of mass, 20-kg plate on one side and sixteen 1.25-kg plates on the other side. Subjects were blindfolded and wore ear protection throughout all testing to decrease the ability to otherwise detect loads. Binomial data analysis indicated that subjects correctly detected the imbalance of the imbalanced asymmetrical center of mass condition (p[correct detection] = 0.89, p < 0.01) but did not correctly detect the balanced condition (p[correct detection] = 0.46, p = 0.74) or the imbalanced asymmetrical condition (p[correct detection] = 0.60, p = 0.31). Although it appears that a substantial shift in the center of mass of plates leads to the detection of barbell imbalance, minor changes of the addition of 1.25 kg (2.5 lb) to the asymmetrical condition did not result in consistent detection. Our data indicate that the establishment of a biofeedback loop capable of determining balance detection was only realized under a high degree of imbalance. Although balance detection was not present in either the even or the slightly uneven loading condition, the inclusion of balance training for upper body may be futile if exercises are unable to establish such a feedback loop and thus eliciting an improvement of balance performance.
Split torque transmission load sharing

NASA Technical Reports Server (NTRS)

Krantz, T. L.; Rashidi, M.; Kish, J. G.

1992-01-01

Split torque transmissions are attractive alternatives to conventional planetary designs for helicopter transmissions. The split torque designs can offer lighter weight and fewer parts but have not been used extensively for lack of experience, especially with obtaining proper load sharing. Two split torque designs that use different load sharing methods have been studied. Precise indexing and alignment of the geartrain to produce acceptable load sharing has been demonstrated. An elastomeric torque splitter that has large torsional compliance and damping produces even better load sharing while reducing dynamic transmission error and noise. However, the elastomeric torque splitter as now configured is not capable over the full range of operating conditions of a fielded system. A thrust balancing load sharing device was evaluated. Friction forces that oppose the motion of the balance mechanism are significant. A static analysis suggests increasing the helix angle of the input pinion of the thrust balancing design. Also, dynamic analysis of this design predicts good load sharing and significant torsional response to accumulative pitch errors of the gears.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Scalable and balanced dynamic hybrid data assimilation

NASA Astrophysics Data System (ADS)

Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

2017-04-01

Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
Analysis and performance of paralleling circuits for modular inverter-converter systems

NASA Technical Reports Server (NTRS)

Birchenough, A. G.; Gourash, F.

1972-01-01

As part of a modular inverter-converter development program, control techniques were developed to provide load sharing among paralleled inverters or converters. An analysis of the requirements of paralleling circuits and a discussion of the circuits developed and their performance are included in this report. The current sharing was within 5.6 percent of rated-load current for the ac modules and 7.4 percent for the dc modules for an initial output voltage unbalance of 5 volts.
Multicoil resonance-based parallel array for smart wireless power delivery.

PubMed

Mirbozorgi, S A; Sawan, M; Gosselin, B

2013-01-01

This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.
Computational evaluation of load carriage effects on gait balance stability.

PubMed

Mummolo, Carlotta; Park, Sukyung; Mangialardi, Luigi; Kim, Joo H

2016-01-01

Evaluating the effects of load carriage on gait balance stability is important in various applications. However, their quantification has not been rigorously addressed in the current literature, partially due to the lack of relevant computational indices. The novel Dynamic Gait Measure (DGM) characterizes gait balance stability by quantifying the relative effects of inertia in terms of zero-moment point, ground projection of center of mass, and time-varying foot support region. In this study, the DGM is formulated in terms of the gait parameters that explicitly reflect the gait strategy of a given walking pattern and is used for computational evaluation of the distinct balance stability of loaded walking. The observed gait adaptations caused by load carriage (decreased single support duration, inertia effects, and step length) result in decreased DGM values (p < 0.0001), which indicate that loaded walking motions are more statically stable compared with the unloaded normal walking. Comparison of the DGM with other common gait stability indices (the maximum Floquet multiplier and the margin of stability) validates the unique characterization capability of the DGM, which is consistently informative of the presence of the added load.
Segmented surface coil resonator for in vivo EPR applications at 1.1GHz.

PubMed

Petryakov, Sergey; Samouilov, Alexandre; Chzhan-Roytenberg, Michael; Kesselring, Eric; Sun, Ziqi; Zweier, Jay L

2009-05-01

A four-loop segmented surface coil resonator (SSCR) with electronic frequency and coupling adjustments was constructed with 18mm aperture and loading capability suitable for in vivo Electron Paramagnetic Resonance (EPR) spectroscopy and imaging applications at L-band. Increased sample volume and loading capability were achieved by employing a multi-loop three-dimensional surface coil structure. Symmetrical design of the resonator with coupling to each loop resulted in high homogeneity of RF magnetic field. Parallel loops were coupled to the feeder cable via balancing circuitry containing varactor diodes for electronic coupling and tuning over a wide range of loading conditions. Manually adjusted high Q trimmer capacitors were used for initial tuning with subsequent tuning electronically controlled using varactor diodes. This design provides transparency and homogeneity of magnetic field modulation in the sample volume, while matching components are shielded to minimize interference with modulation and ambient RF fields. It can accommodate lossy samples up to 90% of its aperture with high homogeneity of RF and modulation magnetic fields and can function as a surface loop or a slice volume resonator. Along with an outer coaxial NMR surface coil, the SSCR enabled EPR/NMR co-imaging of paramagnetic probes in living rats to a depth of 20mm.
Segmented surface coil resonator for in vivo EPR applications at 1.1 GHz

PubMed Central

Petryakov, Sergey; Samouilov, Alexandre; Chzhan-Roytenberg, Michael; Kesselring, Eric; Sun, Ziqi; Zweier, Jay L.

2010-01-01

A four-loop segmented surface coil resonator (SSCR) with electronic frequency and coupling adjustments was constructed with 18 mm aperture and loading capability suitable for in vivo Electron Paramagnetic Resonance (EPR) spectroscopy and imaging applications at L-band. Increased sample volume and loading capability were achieved by employing a multi-loop three-dimensional surface coil structure. Symmetrical design of the resonator with coupling to each loop resulted in high homogeneity of RF magnetic field. Parallel loops were coupled to the feeder cable via balancing circuitry containing varactor diodes for electronic coupling and tuning over a wide range of loading conditions. Manually adjusted high Q trimmer capacitors were used for initial tuning with subsequent tuning electronically controlled using varactor diodes. This design provides transparency and homogeneity of magnetic field modulation in the sample volume, while matching components are shielded to minimize interference with modulation and ambient RF fields. It can accommodate lossy samples up to 90% of its aperture with high homogeneity of RF and modulation magnetic fields and can function as a surface loop or a slice volume resonator. Along with an outer coaxial NMR surface coil, the SSCR enabled EPR/NMR co-imaging of paramagnetic probes in living rats to a depth of 20 mm. PMID:19268615
A Universal Tare Load Prediction Algorithm for Strain-Gage Balance Calibration Data Analysis

NASA Technical Reports Server (NTRS)

Ulbrich, N.

2011-01-01

An algorithm is discussed that may be used to estimate tare loads of wind tunnel strain-gage balance calibration data. The algorithm was originally developed by R. Galway of IAR/NRC Canada and has been described in the literature for the iterative analysis technique. Basic ideas of Galway's algorithm, however, are universally applicable and work for both the iterative and the non-iterative analysis technique. A recent modification of Galway's algorithm is presented that improves the convergence behavior of the tare load prediction process if it is used in combination with the non-iterative analysis technique. The modified algorithm allows an analyst to use an alternate method for the calculation of intermediate non-linear tare load estimates whenever Galway's original approach does not lead to a convergence of the tare load iterations. It is also shown in detail how Galway's algorithm may be applied to the non-iterative analysis technique. Hand load data from the calibration of a six-component force balance is used to illustrate the application of the original and modified tare load prediction method. During the analysis of the data both the iterative and the non-iterative analysis technique were applied. Overall, predicted tare loads for combinations of the two tare load prediction methods and the two balance data analysis techniques showed excellent agreement as long as the tare load iterations converged. The modified algorithm, however, appears to have an advantage over the original algorithm when absolute voltage measurements of gage outputs are processed using the non-iterative analysis technique. In these situations only the modified algorithm converged because it uses an exact solution of the intermediate non-linear tare load estimate for the tare load iteration.

A comprehensive study of MPI parallelism in three-dimensional discrete element method (DEM) simulation of complex-shaped granular particles

NASA Astrophysics Data System (ADS)

Yan, Beichuan; Regueiro, Richard A.

2018-02-01

A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Wang, Peng; Plimpton, Steven J

The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Improving Current Balance In Parallel MOSFET's

NASA Technical Reports Server (NTRS)

Niedra, Janis M.

1992-01-01

Simple circuit makes currents more nearly equal. Addition of diodes and adjustable-tap resistor increases operating range over which drain currents in two unmatched power MOSFET's brought more nearly into balance.
A location selection policy of live virtual machine migration for power saving and load balancing.

PubMed

Zhao, Jia; Ding, Yan; Xu, Gaochao; Hu, Liang; Dong, Yushuang; Fu, Xiaodong

2013-01-01

Green cloud data center has become a research hotspot of virtualized cloud computing architecture. And load balancing has also been one of the most important goals in cloud data centers. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on location selection (migration policy) of live VM migration for power saving and load balancing. We propose a novel approach MOGA-LS, which is a heuristic and self-adaptive multiobjective optimization algorithm based on the improved genetic algorithm (GA). This paper has presented the specific design and implementation of MOGA-LS such as the design of the genetic operators, fitness values, and elitism. We have introduced the Pareto dominance theory and the simulated annealing (SA) idea into MOGA-LS and have presented the specific process to get the final solution, and thus, the whole approach achieves a long-term efficient optimization for power saving and load balancing. The experimental results demonstrate that MOGA-LS evidently reduces the total incremental power consumption and better protects the performance of VM migration and achieves the balancing of system load compared with the existing research. It makes the result of live VM migration more high-effective and meaningful.
A Location Selection Policy of Live Virtual Machine Migration for Power Saving and Load Balancing

PubMed Central

Xu, Gaochao; Hu, Liang; Dong, Yushuang; Fu, Xiaodong

2013-01-01

Green cloud data center has become a research hotspot of virtualized cloud computing architecture. And load balancing has also been one of the most important goals in cloud data centers. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on location selection (migration policy) of live VM migration for power saving and load balancing. We propose a novel approach MOGA-LS, which is a heuristic and self-adaptive multiobjective optimization algorithm based on the improved genetic algorithm (GA). This paper has presented the specific design and implementation of MOGA-LS such as the design of the genetic operators, fitness values, and elitism. We have introduced the Pareto dominance theory and the simulated annealing (SA) idea into MOGA-LS and have presented the specific process to get the final solution, and thus, the whole approach achieves a long-term efficient optimization for power saving and load balancing. The experimental results demonstrate that MOGA-LS evidently reduces the total incremental power consumption and better protects the performance of VM migration and achieves the balancing of system load compared with the existing research. It makes the result of live VM migration more high-effective and meaningful. PMID:24348165
Impact of Groundwater Flow and Energy Load on Multiple Borehole Heat Exchangers.

PubMed

Dehkordi, S Emad; Schincariol, Robert A; Olofsson, Bo

2015-01-01

The effect of array configuration, that is, number, layout, and spacing, on the performance of multiple borehole heat exchangers (BHEs) is generally known under the assumption of fully conductive transport. The effect of groundwater flow on BHE performance is also well established, but most commonly for single BHEs. In multiple-BHE systems the effect of groundwater advection can be more complicated due to the induced thermal interference between the boreholes. To ascertain the influence of groundwater flow and borehole arrangement, this study investigates single- and multi-BHE systems of various configurations. Moreover, the influence of energy load balance is also examined. The results from corresponding cases with and without groundwater flow as well as balanced and unbalanced energy loads are cross-compared. The groundwater flux value, 10(-7) m/s, is chosen based on the findings of previous studies on groundwater flow interaction with BHEs and thermal response tests. It is observed that multi-BHE systems with balanced loads are less sensitive to array configuration attributes and groundwater flow, in the long-term. Conversely, multi-BHE systems with unbalanced loads are influenced by borehole array configuration as well as groundwater flow; these effects become more pronounced with time, unlike when the load is balanced. Groundwater flow has more influence on stabilizing loop temperatures, compared to array characteristics. Although borehole thermal energy storage (BTES) systems have a balanced energy load function, preliminary investigation on their efficiency shows a negative impact by groundwater which is due to their dependency on high temperature gradients between the boreholes and surroundings. © 2014, National Ground Water Association.
Changes in collagen fibril network organization and proteoglycan distribution in equine articular cartilage during maturation and growth

PubMed Central

Hyttinen, Mika M; Holopainen, Jaakko; René van Weeren, P; Firth, Elwyn C; Helminen, Heikki J; Brama, Pieter A J

2009-01-01

The aim of this study was to record growth-related changes in collagen network organization and proteoglycan distribution in intermittently peak-loaded and continuously lower-level-loaded articular cartilage. Cartilage from the proximal phalangeal bone of the equine metacarpophalangeal joint at birth, at 5, 11 and 18 months, and at 6–10 years of age was collected from two sites. Site 1, at the joint margin, is unloaded at slow gaits but is subjected to high-intensity loading during athletic activity; site 2 is a continuously but less intensively loaded site in the centre of the joint. The degree of collagen parallelism was determined with quantitative polarized light microscopy and the parallelism index for collagen fibrils was computed from the cartilage surface to the osteochondral junction. Concurrent changes in the proteoglycan distribution were quantified with digital densitometry. We found that the parallelism index increased significantly with age (up to 90%). At birth, site 2 exhibited a more organized collagen network than site 1. In adult horses this situation was reversed. The superficial and intermediate zones exhibited the greatest reorganization of collagen. Site 1 had a higher proteoglycan content than site 2 at birth but here too the situation was reversed in adult horses. We conclude that large changes in joint loading during growth and maturation in the period from birth to adulthood profoundly affect the architecture of the collagen network in equine cartilage. In addition, the distribution and content of proteoglycans are modified significantly by altered joint use. Intermittent peak-loading with shear seems to induce higher collagen parallelism and a lower proteoglycan content in cartilage than more constant weight-bearing. Therefore, we hypothesize that the formation of mature articular cartilage with a highly parallel collagen network and relatively low proteoglycan content in the peak-loaded area of a joint is needed to withstand intermittent stress and shear, whereas a constantly weight-bearing joint area benefits from lower collagen parallelism and a higher proteoglycan content. PMID:19732210
Lattice strains and load partitioning in bovine trabecular bone.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Akhtar, R.; Daymond, M. R.; Almer, J. D.

2012-02-01

Microdamage and failure mechanisms have been well characterized in bovine trabecular bone. However, little is known about how elastic strains develop in the apatite crystals of the trabecular struts and their relationship with different deformation mechanisms. In this study, wide-angle high-energy synchrotron X-ray diffraction has been used to determine bulk elastic strains under in situ compression. Dehydrated bone is compared to hydrated bone in terms of their response to load. During compression, load is initially borne by trabeculae aligned parallel to loading direction with non-parallel trabeculae deforming by bending. Ineffective load partitioning is noted in dehydrated bone whereas hydrated bonemore » behaves like a plastically yielding foam« less
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

NASA Astrophysics Data System (ADS)

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

2014-12-01

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn; Deng, Xiaogang; Zhang, Lilun

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore » high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.« less
A calibration rig for multi-component internal strain gauge balance using the new design-of-experiment (DOE) approach

NASA Astrophysics Data System (ADS)

Nouri, N. M.; Mostafapour, K.; Kamran, M.

2018-02-01

In a closed water-tunnel circuit, the multi-component strain gauge force and moment sensor (also known as balance) are generally used to measure hydrodynamic forces and moments acting on scaled models. These balances are periodically calibrated by static loading. Their performance and accuracy depend significantly on the rig and the method of calibration. In this research, a new calibration rig was designed and constructed to calibrate multi-component internal strain gauge balances. The calibration rig has six degrees of freedom and six different component-loading structures that can be applied separately and synchronously. The system was designed based on the applicability of formal experimental design techniques, using gravity for balance loading and balance positioning and alignment relative to gravity. To evaluate the calibration rig, a six-component internal balance developed by Iran University of Science and Technology was calibrated using response surface methodology. According to the results, calibration rig met all design criteria. This rig provides the means by which various methods of formal experimental design techniques can be implemented. The simplicity of the rig saves time and money in the design of experiments and in balance calibration while simultaneously increasing the accuracy of these activities.
Automatic load sharing in inverter modules

NASA Technical Reports Server (NTRS)

Nagano, S.

1979-01-01

Active feedback loads transistor equally with little power loss. Circuit is suitable for balancing modular inverters in spacecraft, computer power supplies, solar-electric power generators, and electric vehicles. Current-balancing circuit senses differences between collector current for power transistor and average value of load currents for all power transistors. Principle is effective not only in fixed duty-cycle inverters but also in converters operating at variable duty cycles.
ICON-MIC: Implementing a CPU/MIC Collaboration Parallel Framework for ICON on Tianhe-2 Supercomputer.

PubMed

Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa

2018-03-01

Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
Reconstruction of coded aperture images

NASA Technical Reports Server (NTRS)

Bielefeld, Michael J.; Yin, Lo I.

1987-01-01

Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Calibration Variable Selection and Natural Zero Determination for Semispan and Canard Balances

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert M.

2013-01-01

Independent calibration variables for the characterization of semispan and canard wind tunnel balances are discussed. It is shown that the variable selection for a semispan balance is determined by the location of the resultant normal and axial forces that act on the balance. These two forces are the first and second calibration variable. The pitching moment becomes the third calibration variable after the normal and axial forces are shifted to the pitch axis of the balance. Two geometric distances, i.e., the rolling and yawing moment arms, are the fourth and fifth calibration variable. They are traditionally substituted by corresponding moments to simplify the use of calibration data during a wind tunnel test. A canard balance is related to a semispan balance. It also only measures loads on one half of a lifting surface. However, the axial force and yawing moment are of no interest to users of a canard balance. Therefore, its calibration variable set is reduced to the normal force, pitching moment, and rolling moment. The combined load diagrams of the rolling and yawing moment for a semispan balance are discussed. They may be used to illustrate connections between the wind tunnel model geometry, the test section size, and the calibration load schedule. Then, methods are reviewed that may be used to obtain the natural zeros of a semispan or canard balance. In addition, characteristics of three semispan balance calibration rigs are discussed. Finally, basic requirements for a full characterization of a semispan balance are reviewed.
Research on the parallel load sharing principle of a novel self-decoupled piezoelectric six-dimensional force sensor.

PubMed

Li, Ying-Jun; Yang, Cong; Wang, Gui-Cong; Zhang, Hui; Cui, Huan-Yong; Zhang, Yong-Liang

2017-09-01

This paper presents a novel integrated piezoelectric six-dimensional force sensor which can realize dynamic measurement of multi-dimensional space load. Firstly, the composition of the sensor, the spatial layout of force-sensitive components, and measurement principle are analyzed and designed. There is no interference of piezoelectric six-dimensional force sensor in theoretical analysis. Based on the principle of actual work and deformation compatibility coherence, this paper deduces the parallel load sharing principle of the piezoelectric six-dimensional force sensor. The main effect factors which affect the load sharing ratio are obtained. The finite element model of the piezoelectric six-dimensional force sensor is established. In order to verify the load sharing principle of the sensor, a load sharing test device of piezoelectric force sensor is designed and fabricated. The load sharing experimental platform is set up. The experimental results are in accordance with the theoretical analysis and simulation results. The experiments show that the multi-dimensional and heavy force measurement can be realized by the parallel arrangement of the load sharing ring and the force sensitive element in the novel integrated piezoelectric six-dimensional force sensor. The ideal load sharing effect of the sensor can be achieved by appropriate size parameters. This paper has an important guide for the design of the force measuring device according to the load sharing mode. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Evaluation of Potential Energy Loss Reduction and Savings for U. S. Army Electrical Distribution Systems

DTIC Science & Technology

1993-09-01

Different Size Transformers (Per Transformer ) 41 15 Additional Energy Losses for Mis-Sized Transformers (Per Transformer ) 42 16 Power System ...directly affects the amount of neutral line power loss in the system . Since most Army three-phase loads are distribution transformers spread out over a...61 Balancing Three-Phase Loads Balancing Feeder Circuit Loads Power Factor Correction Optimal Transformer Sizing Conductor Sizing Combined
Accelerating Climate and Weather Simulations through Hybrid Computing

NASA Technical Reports Server (NTRS)

Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark

2011-01-01

Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE PAGES

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...

2017-11-27

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less

A survey on investigating the need for intelligent power-aware load balanced routing protocols for handling critical links in MANETs.

PubMed

Sivakumar, B; Bhalaji, N; Sivakumar, D

2014-01-01

In mobile ad hoc networks connectivity is always an issue of concern. Due to dynamism in the behavior of mobile nodes, efficiency shall be achieved only with the assumption of good network infrastructure. Presence of critical links results in deterioration which should be detected in advance to retain the prevailing communication setup. This paper discusses a short survey on the specialized algorithms and protocols related to energy efficient load balancing for critical link detection in the recent literature. This paper also suggests a machine learning based hybrid power-aware approach for handling critical nodes via load balancing.
A Survey on Investigating the Need for Intelligent Power-Aware Load Balanced Routing Protocols for Handling Critical Links in MANETs

PubMed Central

Sivakumar, B.; Bhalaji, N.; Sivakumar, D.

2014-01-01

In mobile ad hoc networks connectivity is always an issue of concern. Due to dynamism in the behavior of mobile nodes, efficiency shall be achieved only with the assumption of good network infrastructure. Presence of critical links results in deterioration which should be detected in advance to retain the prevailing communication setup. This paper discusses a short survey on the specialized algorithms and protocols related to energy efficient load balancing for critical link detection in the recent literature. This paper also suggests a machine learning based hybrid power-aware approach for handling critical nodes via load balancing. PMID:24790546
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

PubMed

Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

2013-01-28

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Electric Grid Expansion Planning with High Levels of Variable Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadley, Stanton W.; You, Shutang; Shankar, Mallikarjun

2016-02-01

Renewables are taking a large proportion of generation capacity in U.S. power grids. As their randomness has increasing influence on power system operation, it is necessary to consider their impact on system expansion planning. To this end, this project studies the generation and transmission expansion co-optimization problem of the US Eastern Interconnection (EI) power grid with a high wind power penetration rate. In this project, the generation and transmission expansion problem for the EI system is modeled as a mixed-integer programming (MIP) problem. This study analyzed a time series creation method to capture the diversity of load and wind powermore » across balancing regions in the EI system. The obtained time series can be easily introduced into the MIP co-optimization problem and then solved robustly through available MIP solvers. Simulation results show that the proposed time series generation method and the expansion co-optimization model and can improve the expansion result significantly after considering the diversity of wind and load across EI regions. The improved expansion plan that combines generation and transmission will aid system planners and policy makers to maximize the social welfare. This study shows that modelling load and wind variations and diversities across balancing regions will produce significantly different expansion result compared with former studies. For example, if wind is modeled in more details (by increasing the number of wind output levels) so that more wind blocks are considered in expansion planning, transmission expansion will be larger and the expansion timing will be earlier. Regarding generation expansion, more wind scenarios will slightly reduce wind generation expansion in the EI system and increase the expansion of other generation such as gas. Also, adopting detailed wind scenarios will reveal that it may be uneconomic to expand transmission networks for transmitting a large amount of wind power through a long distance in the EI system. Incorporating more details of renewables in expansion planning will inevitably increase the computational burden. Therefore, high performance computing (HPC) techniques are urgently needed for power system operation and planning optimization. As a scoping study task, this project tested some preliminary parallel computation techniques such as breaking down the simulation task into several sub-tasks based on chronology splitting or sample splitting, and then assigning these sub-tasks to different cores. Testing results show significant time reduction when a simulation task is split into several sub-tasks for parallel execution.« less
A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation

NASA Astrophysics Data System (ADS)

Han, Wang; Lin, Tan

2016-02-01

This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.
Effects of Wii balance board exercises on balance after posterior cruciate ligament reconstruction.

PubMed

Puh, Urška; Majcen, Nia; Hlebš, Sonja; Rugelj, Darja

2014-05-01

To establish the effects of training on Wii balance board (WBB) after posterior cruciate ligament (PCL) reconstruction on balance. Included patient injured her posterior cruciate ligament 22 months prior to the study. Training on WBB was performed 4 weeks, 6 times per week, 30-45 min per day. Center of pressure (CoP) sway during parallel and one-leg stance, and body weight distribution in parallel stance were measured. Additionally, measurements of joint range of motion and limb circumferences were taken before and after training. After training, the body weight was almost equally distributed on both legs. Decrease in CoP sway was most significant for one-leg stance with each leg on compliant surface with eyes open and closed. The knee joint range of motion increased and limb circumferences decreased. According to the results of this single case report, we might recommend the use of WBB for balance training after PCL reconstruction. Case series with no comparison group, Level IV.
A Dynamic Calibration Method for Experimental and Analytical Hub Load Comparison

NASA Technical Reports Server (NTRS)

Kreshock, Andrew R.; Thornburgh, Robert P.; Wilbur, Matthew L.

2017-01-01

This paper presents the results from an ongoing effort to produce improved correlation between analytical hub force and moment prediction and those measured during wind-tunnel testing on the Aeroelastic Rotor Experimental System (ARES), a conventional rotor testbed commonly used at the Langley Transonic Dynamics Tunnel (TDT). A frequency-dependent transformation between loads at the rotor hub and outputs of the testbed balance is produced from frequency response functions measured during vibration testing of the system. The resulting transformation is used as a dynamic calibration of the balance to transform hub loads predicted by comprehensive analysis into predicted balance outputs. In addition to detailing the transformation process, this paper also presents a set of wind-tunnel test cases, with comparisons between the measured balance outputs and transformed predictions from the comprehensive analysis code CAMRAD II. The modal response of the testbed is discussed and compared to a detailed finite-element model. Results reveal that the modal response of the testbed exhibits a number of characteristics that make accurate dynamic balance predictions challenging, even with the use of the balance transformation.
Active tower damping and pitch balancing - design, simulation and field test

NASA Astrophysics Data System (ADS)

Duckwitz, Daniel; Shan, Martin

2014-12-01

The tower is one of the major components in wind turbines with a contribution to the cost of energy of 8 to 12% [1]. In this overview the load situation of the tower will be described in terms of sources of loads, load components and fatigue contribution. Then two load reduction control schemes are described along with simulation and field test results. Pitch Balancing is described as a method to reduce aerodynamic asymmetry and the resulting fatigue loads. Active Tower Damping is reducing the tower oscillations by applying appropiate pitch angle changes. A field test was conducted on an Areva M5000 wind turbine.
Clean vehicles as an enabler for a clean electricity grid

NASA Astrophysics Data System (ADS)

Coignard, Jonathan; Saxena, Samveg; Greenblatt, Jeffery; Wang, Dai

2018-05-01

California has issued ambitious targets to decarbonize transportation through the deployment of electric vehicles (EVs), and to decarbonize the electricity grid through the expansion of both renewable generation and energy storage. These parallel efforts can provide an untapped synergistic opportunity for clean transportation to be an enabler for a clean electricity grid. To quantify this potential, we forecast the hourly system-wide balancing problems arising out to 2025 as more renewables are deployed and load continues to grow. We then quantify the system-wide balancing benefits from EVs modulating the charging or discharging of their batteries to mitigate renewable intermittency, without compromising the mobility needs of drivers. Our results show that with its EV deployment target and with only one-way charging control of EVs, California can achieve much of the same benefit of its Storage Mandate for mitigating renewable intermittency, but at a small fraction of the cost. Moreover, EVs provide many times these benefits if two-way charging control becomes widely available. Thus, EVs support the state’s renewable integration targets while avoiding much of the tremendous capital investment of stationary storage that can instead be applied towards further deployment of clean vehicles.
Experimental verification of the role of electron pressure in fast magnetic reconnection with a guide field

DOE PAGES

Fox, W.; Sciortino, F.; v. Stechow, A.; ...

2017-03-21

We report detailed laboratory observations of the structure of a reconnection current sheet in a two-fluid plasma regime with a guide magnetic field. We observe and quantitatively analyze the quadrupolar electron pressure variation in the ion-diffusion region, as originally predicted by extended magnetohydrodynamics simulations. The projection of the electron pressure gradient parallel to the magnetic field contributes significantly to balancing the parallel electric field, and the resulting cross-field electron jets in the reconnection layer are diamagnetic in origin. Furthermore, these results demonstrate how parallel and perpendicular force balance are coupled in guide field reconnection and confirm basic theoretical models ofmore » the importance of electron pressure gradients for obtaining fast magnetic reconnection.« less
Dynamic Calibration of the NASA Ames Rotor Test Apparatus Steady/Dynamic Rotor Balance

NASA Technical Reports Server (NTRS)

Peterson, Randall L.; vanAken, Johannes M.

1996-01-01

The NASA Ames Rotor Test Apparatus was modified to include a Steady/Dynamic Rotor Balance. The dynamic calibration procedures and configurations are discussed. Random excitation was applied at the rotor hub, and vibratory force and moment responses were measured on the steady/dynamic rotor balance. Transfer functions were computed using the load cell data and the vibratory force and moment responses from the rotor balance. Calibration results showing the influence of frequency bandwidth, hub mass, rotor RPM, thrust preload, and dynamic loads through the stationary push rods are presented and discussed.
A variable acceleration calibration system

NASA Astrophysics Data System (ADS)

Johnson, Thomas H.

2011-12-01

A variable acceleration calibration system that applies loads using gravitational and centripetal acceleration serves as an alternative, efficient and cost effective method for calibrating internal wind tunnel force balances. Two proof-of-concept variable acceleration calibration systems are designed, fabricated and tested. The NASA UT-36 force balance served as the test balance for the calibration experiments. The variable acceleration calibration systems are shown to be capable of performing three component calibration experiments with an approximate applied load error on the order of 1% of the full scale calibration loads. Sources of error are indentified using experimental design methods and a propagation of uncertainty analysis. Three types of uncertainty are indentified for the systems and are attributed to prediction error, calibration error and pure error. Angular velocity uncertainty is shown to be the largest indentified source of prediction error. The calibration uncertainties using a production variable acceleration based system are shown to be potentially equivalent to current methods. The production quality system can be realized using lighter materials and a more precise instrumentation. Further research is needed to account for balance deflection, forcing effects due to vibration, and large tare loads. A gyroscope measurement technique is shown to be capable of resolving the balance deflection angle calculation. Long term research objectives include a demonstration of a six degree of freedom calibration, and a large capacity balance calibration.
A novel strategy for load balancing of distributed medical applications.

PubMed

Logeswaran, Rajasvaran; Chen, Li-Choo

2012-04-01

Current trends in medicine, specifically in the electronic handling of medical applications, ranging from digital imaging, paperless hospital administration and electronic medical records, telemedicine, to computer-aided diagnosis, creates a burden on the network. Distributed Service Architectures, such as Intelligent Network (IN), Telecommunication Information Networking Architecture (TINA) and Open Service Access (OSA), are able to meet this new challenge. Distribution enables computational tasks to be spread among multiple processors; hence, performance is an important issue. This paper proposes a novel approach in load balancing, the Random Sender Initiated Algorithm, for distribution of tasks among several nodes sharing the same computational object (CO) instances in Distributed Service Architectures. Simulations illustrate that the proposed algorithm produces better network performance than the benchmark load balancing algorithms-the Random Node Selection Algorithm and the Shortest Queue Algorithm, especially under medium and heavily loaded conditions.
Accurate calculation of multispar cantilever and semicantilever wings with parallel webs under direct and indirect loading

NASA Technical Reports Server (NTRS)

Sanger, Eugen

1932-01-01

In the present report the computation is actually carried through for the case of parallel spars of equal resistance in bending without direct loading, including plotting of the influence lines; for other cases the method of calculation is explained. The development of large size airplanes can be speeded up by accurate methods of calculation such as this.
Comparison of Deterministic and Probabilistic Radial Distribution Systems Load Flow

NASA Astrophysics Data System (ADS)

Gupta, Atma Ram; Kumar, Ashwani

2017-12-01

Distribution system network today is facing the challenge of meeting increased load demands from the industrial, commercial and residential sectors. The pattern of load is highly dependent on consumer behavior and temporal factors such as season of the year, day of the week or time of the day. For deterministic radial distribution load flow studies load is taken as constant. But, load varies continually with a high degree of uncertainty. So, there is a need to model probable realistic load. Monte-Carlo Simulation is used to model the probable realistic load by generating random values of active and reactive power load from the mean and standard deviation of the load and for solving a Deterministic Radial Load Flow with these values. The probabilistic solution is reconstructed from deterministic data obtained for each simulation. The main contribution of the work is: Finding impact of probable realistic ZIP load modeling on balanced radial distribution load flow. Finding impact of probable realistic ZIP load modeling on unbalanced radial distribution load flow. Compare the voltage profile and losses with probable realistic ZIP load modeling for balanced and unbalanced radial distribution load flow.
Strain Gauge Balance Calibration and Data Reduction at NASA Langley Research Center

NASA Technical Reports Server (NTRS)

Ferris, A. T. Judy

1999-01-01

This paper will cover the standard force balance calibration and data reduction techniques used at Langley Research Center. It will cover balance axes definition, balance type, calibration instrumentation, traceability of standards to NIST, calibration loading procedures, balance calibration mathematical model, calibration data reduction techniques, balance accuracy reporting, and calibration frequency.
Load Balancing in Multi Cloud Computing Environment with Genetic Algorithm

NASA Astrophysics Data System (ADS)

Vhansure, Fularani; Deshmukh, Apurva; Sumathy, S.

2017-11-01

Cloud is a pool of resources that is available on pay per use model. It provides services to the user which is increasing rapidly. Load balancing is an issue because it cannot handle so many requests at a time. It is also known as NP complete problem. In traditional system the functions consist of various parameter values to maximise it in order to achieve best optimal individualsolutions. Challenge is when there are many parameters of solutionsin the system space. Another challenge is to optimize the function which is much more complex. In this paper, various techniques to handle load balancing virtually (VM) as well as physically (nodes) using genetic algorithm is discussed.
Implementation of GAMMON - An efficient load balancing strategy for a local computer system

NASA Technical Reports Server (NTRS)

Baumgartner, Katherine M.; Kling, Ralph M.; Wah, Benjamin W.

1989-01-01

GAMMON (Global Allocation from Maximum to Minimum in cONstant time), an efficient load-balancing algorithm, is described. GAMMON uses the available broadcast capability of multiaccess networks to implement an efficient search technique for finding hosts with maximal and minimal loads. The search technique has an average overhead which is independent of the number of participating stations. The transition from the theoretical concept to a practical, reliable, and efficient implementation is described.
Enhanced method of fast re-routing with load balancing in software-defined networks

NASA Astrophysics Data System (ADS)

Lemeshko, Oleksandr; Yeremenko, Oleksandra

2017-11-01

A two-level method of fast re-routing with load balancing in a software-defined network (SDN) is proposed. The novelty of the method consists, firstly, in the introduction of a two-level hierarchy of calculating the routing variables responsible for the formation of the primary and backup paths, and secondly, in ensuring a balanced load of the communication links of the network, which meets the requirements of the traffic engineering concept. The method provides implementation of link, node, path, and bandwidth protection schemes for fast re-routing in SDN. The separation in accordance with the interaction prediction principle along two hierarchical levels of the calculation functions of the primary (lower level) and backup (upper level) routes allowed to abandon the initial sufficiently large and nonlinear optimization problem by transiting to the iterative solution of linear optimization problems of half the dimension. The analysis of the proposed method confirmed its efficiency and effectiveness in terms of obtaining optimal solutions for ensuring balanced load of communication links and implementing the required network element protection schemes for fast re-routing in SDN.
[Mapping Critical Loads of Heavy Metals for Soil Based on Different Environmental Effects].

PubMed

Shi, Ya-xing; Wu, Shao-hua; Zhou, Sheng-lu; Wang, Chun-hui; Chen, Hao

2015-12-01

China's rapid development of industrialization and urbanization causes the growing problem of heavy metal pollution of soil, threatening environment and human health. Therefore, prevention and management of heavy metal pollution become particularly important. Critical loads of heavy metals are an important management tool that can be utilized to prevent the occurrence of heavy metal pollution. Our study was based on three cases: status balance, water environmental effects and health risks. We used the steady-state mass balance equation to calculate the critical loads of Cd, Cu, Pb, Zn at different effect levels and analyze the values and spatial variation of critical loads. In addition, we used the annual input fluxes of heavy metals of the agro-ecosystem in the Yangtze River delta and China to estimate the proportion of area with exceedance of critical loads. The results demonstrated that the critical load value of Cd was the minimum, and the values of Cu and Zn were lager. There were spatial differences among the critical loads of four elements in the study area, lower critical loads areas mainly occurred in woodland and high value areas distributed in the east and southwest of the study area, while median values and the medium high areas mainly occurred in farmland. Comparing the input fluxes of heavy metals, we found that Pb and Zn in more than 90% of the area exceeded the critical loads under different environmental effects in the study area. The critical load exceedance of Cd mainly occurred under the status balance and the water environmental effect, while Cu under the status balance and water environmental effect with a higher proportion of exceeded areas. Critical loads of heavy metals at different effect levels in this study could serve as a reference from effective control of the emissions of heavy metals and to prevent the occurrence of heavy metal pollution.

Maintaining Balance: The Increasing Role of Energy Storage for Renewable Integration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stenclik, Derek; Denholm, Paul; Chalamala, Babu

For nearly a century, global power systems have focused on three key functions: generating, transmitting, and distributing electricity as a real-time commodity. Physics requires that electricity generation always be in real-time balance with load-despite variability in load on time scales ranging from subsecond disturbances to multiyear trends. With the increasing role of variable generation from wind and solar, the retirement of fossil-fuel-based generation, and a changing consumer demand profile, grid operators are using new methods to maintain this balance.
Traffic off-balancing algorithm for energy efficient networks

NASA Astrophysics Data System (ADS)

Kim, Junhyuk; Lee, Chankyun; Rhee, June-Koo Kevin

2011-12-01

Physical layer of high-end network system uses multiple interface arrays. Under the load-balancing perspective, light load can be distributed to multiple interfaces. However, it can cause energy inefficiency in terms of the number of poor utilization interfaces. To tackle this energy inefficiency, traffic off-balancing algorithm for traffic adaptive interface sleep/awake is investigated. As a reference model, 40G/100G Ethernet is investigated. We report that suggested algorithm can achieve energy efficiency while satisfying traffic transmission requirement.
A 10 bit 200 MS/s pipeline ADC using loading-balanced architecture in 0.18 μm CMOS

NASA Astrophysics Data System (ADS)

Wang, Linfeng; Meng, Qiao; Zhi, Hao; Li, Fei

2017-07-01

A new loading-balanced architecture for high speed and low power consumption pipeline analog-to-digital converter (ADC) is presented in this paper. The proposed ADC uses SHA-less, op-amp and capacitor-sharing technique, capacitor-scaling scheme to reduce the die area and power consumption. A new capacitor-sharing scheme was proposed to cancel the extra reset phase of the feedback capacitors. The non-standard inter-stage gain increases the feedback factor of the first stage and makes it equal to the second stage, by which, the load capacitor of op-amp shared by the first and second stages is balanced. As for the fourth stage, the capacitor and op-amp no longer scale down. From the system’s point of view, all load capacitors of the shared OTAs are balanced by employing a loading-balanced architecture. The die area and power consumption are optimized maximally. The ADC is implemented in a 0.18 μm 1P6M CMOS technology, and occupies a die area of 1.2 × 1.2 mm{}2. The measurement results show a 55.58 dB signal-to-noise-and-distortion ratio (SNDR) and 62.97 dB spurious-free dynamic range (SFDR) with a 25 MHz input operating at a 200 MS/s sampling rate. The proposed ADC consumes 115 mW at 200 MS/s from a 1.8 V supply.
Parallelisation study of a three-dimensional environmental flow model

NASA Astrophysics Data System (ADS)

O'Donncha, Fearghal; Ragnoli, Emanuele; Suits, Frank

2014-03-01

There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.
Multiscale Multilevel Approach to Solution of Nanotechnology Problems

NASA Astrophysics Data System (ADS)

Polyakov, Sergey; Podryga, Viktoriia

2018-02-01

The paper is devoted to a multiscale multilevel approach for the solution of nanotechnology problems on supercomputer systems. The approach uses the combination of continuum mechanics models and the Newton dynamics for individual particles. This combination includes three scale levels: macroscopic, mesoscopic and microscopic. For gas-metal technical systems the following models are used. The quasihydrodynamic system of equations is used as a mathematical model at the macrolevel for gas and solid states. The system of Newton equations is used as a mathematical model at the mesoand microlevels; it is written for nanoparticles of the medium and larger particles moving in the medium. The numerical implementation of the approach is based on the method of splitting into physical processes. The quasihydrodynamic equations are solved by the finite volume method on grids of different types. The Newton equations of motion are solved by Verlet integration in each cell of the grid independently or in groups of connected cells. In the framework of the general methodology, four classes of algorithms and methods of their parallelization are provided. The parallelization uses the principles of geometric parallelism and the efficient partitioning of the computational domain. A special dynamic algorithm is used for load balancing the solvers. The testing of the developed approach was made by the example of the nitrogen outflow from a balloon with high pressure to a vacuum chamber through a micronozzle and a microchannel. The obtained results confirm the high efficiency of the developed methodology.
Locality Aware Concurrent Start for Stencil Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
Low cost electronic ultracapacitor interface technique to provide load leveling of a battery for pulsed load or motor traction drive applications

DOEpatents

King, Robert Dean; DeDoncker, Rik Wivina Anna Adelson

1998-01-01

A battery load leveling arrangement for an electrically powered system in which battery loading is subject to intermittent high current loading utilizes a passive energy storage device and a diode connected in series with the storage device to conduct current from the storage device to the load when current demand forces a drop in battery voltage. A current limiting circuit is connected in parallel with the diode for recharging the passive energy storage device. The current limiting circuit functions to limit the average magnitude of recharge current supplied to the storage device. Various forms of current limiting circuits are disclosed, including a PTC resistor coupled in parallel with a fixed resistor. The current limit circuit may also include an SCR for switching regenerative braking current to the device when the system is connected to power an electric motor.
Neoclassical parallel flow calculation in the presence of external parallel momentum sources in Heliotron J

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishioka, K.; Nakamura, Y.; Nishimura, S.

A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

NASA Technical Reports Server (NTRS)

Padovan, Joseph; Krishna, Lala; Gute, Douglas

1997-01-01

Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
Evaluation of co-metabolic removal of trichloroethylene in a biotrickling filter under acidic conditions.

PubMed

Chheda, Dhawal; Sorial, George A

2017-07-01

This study investigated the removal of hydrophobic trichloroethylene (TCE) in the presence of methanol (co-metabolite) in a biotrickling filter, which was seeded with fungi at pH4. Starvation was chosen as the biomass control strategy. Two systems, Biofilter I (methanol:TCE 70:30) and Biofilter II (methanol:TCE 80:20) were run in parallel, each with varying composition ratios. The TCE loading rates for both biofilters ranged from 3.22 to 12.88g/m 3 /hr. Depending on the ratio, methanol concentrations varied from 4.08 to 27.95g/m 3 /hr. The performance of the systems was evaluated and compared by calculating removal kinetics, carbon mass balance, efficiencies and elimination capacities. Methanol was observed to enhance TCE removal during the initial loading rate. However, methanol later inhibited TCE degradation above 6.44g TCE/m 3 /hr (Biofilter I) and 3.22g TCE/m 3 /hr (Biofilter II). Conversely, TCE did not impede methanol removal because over 95% methanol elimination was consistently achieved. Overall, Biofilter I was able to outperform Biofilter II due to its greater resistance towards methanol competition. Copyright © 2016. Published by Elsevier B.V.
Concurrent processing simulation of the space station

NASA Technical Reports Server (NTRS)

Gluck, R.; Hale, A. L.; Sunkel, John W.

1989-01-01

The development of a new capability for the time-domain simulation of multibody dynamic systems and its application to the study of a large angle rotational maneuvers of the Space Station is described. The effort was divided into three sequential tasks, which required significant advancements of the state-of-the art to accomplish. These were: (1) the development of an explicit mathematical model via symbol manipulation of a flexible, multibody dynamic system; (2) the development of a methodology for balancing the computational load of an explicit mathematical model for concurrent processing; and (3) the implementation and successful simulation of the above on a prototype Custom Architectured Parallel Processing System (CAPPS) containing eight processors. The throughput rate achieved by the CAPPS operating at only 70 percent efficiency, was 3.9 times greater than that obtained sequentially by the IBM 3090 supercomputer simulating the same problem. More significantly, analysis of the results leads to the conclusion that the relative cost effectiveness of concurrent vs. sequential digital computation will grow substantially as the computational load is increased. This is a welcomed development in an era when very complex and cumbersome mathematical models of large space vehicles must be used as substitutes for full scale testing which has become impractical.
Design of an Input-Parallel Output-Parallel LLC Resonant DC-DC Converter System for DC Microgrids

NASA Astrophysics Data System (ADS)

Juan, Y. L.; Chen, T. R.; Chang, H. M.; Wei, S. E.

2017-11-01

Compared with the centralized power system, the distributed modularized power system is composed of several power modules with lower power capacity to provide a totally enough power capacity for the load demand. Therefore, the current stress of the power components in each module can then be reduced, and the flexibility of system setup is also enhanced. However, the parallel-connected power modules in the conventional system are usually controlled to equally share the power flow which would result in lower efficiency in low loading condition. In this study, a modular power conversion system for DC micro grid is developed with 48 V dc low voltage input and 380 V dc high voltage output. However, in the developed system control strategy, the numbers of power modules enabled to share the power flow is decided according to the output power at lower load demand. Finally, three 350 W power modules are constructed and parallel-connected to setup a modular power conversion system. From the experimental results, compared with the conventional system, the efficiency of the developed power system in the light loading condition is greatly improved. The modularized design of the power system can also decrease the power loss ratio to the system capacity.
ADHydro: A Parallel Implementation of a Large-scale High-Resolution Multi-Physics Distributed Water Resources Model Using the Charm++ Run Time System

NASA Astrophysics Data System (ADS)

Steinke, R. C.; Ogden, F. L.; Lai, W.; Moreno, H. A.; Pureza, L. G.

2014-12-01

Physics-based watershed models are useful tools for hydrologic studies, water resources management and economic analyses in the contexts of climate, land-use, and water-use changes. This poster presents a parallel implementation of a quasi 3-dimensional, physics-based, high-resolution, distributed water resources model suitable for simulating large watersheds in a massively parallel computing environment. Developing this model is one of the objectives of the NSF EPSCoR RII Track II CI-WATER project, which is joint between Wyoming and Utah EPSCoR jurisdictions. The model, which we call ADHydro, is aimed at simulating important processes in the Rocky Mountain west, including: rainfall and infiltration, snowfall and snowmelt in complex terrain, vegetation and evapotranspiration, soil heat flux and freezing, overland flow, channel flow, groundwater flow, water management and irrigation. Model forcing is provided by the Weather Research and Forecasting (WRF) model, and ADHydro is coupled with the NOAH-MP land-surface scheme for calculating fluxes between the land and atmosphere. The ADHydro implementation uses the Charm++ parallel run time system. Charm++ is based on location transparent message passing between migrateable C++ objects. Each object represents an entity in the model such as a mesh element. These objects can be migrated between processors or serialized to disk allowing the Charm++ system to automatically provide capabilities such as load balancing and checkpointing. Objects interact with each other by passing messages that the Charm++ system routes to the correct destination object regardless of its current location. This poster discusses the algorithms, communication patterns, and caching strategies used to implement ADHydro with Charm++. The ADHydro model code will be released to the hydrologic community in late 2014.
Acceleration of Semiempirical QM/MM Methods through Message Passage Interface (MPI), Hybrid MPI/Open Multiprocessing, and Self-Consistent Field Accelerator Implementations.

PubMed

Ojeda-May, Pedro; Nam, Kwangho

2017-08-08

The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Population-based learning of load balancing policies for a distributed computer system

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; Wah, Benjamin W.

1993-01-01

Effective load-balancing policies use dynamic resource information to schedule tasks in a distributed computer system. We present a novel method for automatically learning such policies. At each site in our system, we use a comparator neural network to predict the relative speedup of an incoming task using only the resource-utilization patterns obtained prior to the task's arrival. Outputs of these comparator networks are broadcast periodically over the distributed system, and the resource schedulers at each site use these values to determine the best site for executing an incoming task. The delays incurred in propagating workload information and tasks from one site to another, as well as the dynamic and unpredictable nature of workloads in multiprogrammed multiprocessors, may cause the workload pattern at the time of execution to differ from patterns prevailing at the times of load-index computation and decision making. Our load-balancing policy accommodates this uncertainty by using certain tunable parameters. We present a population-based machine-learning algorithm that adjusts these parameters in order to achieve high average speedups with respect to local execution. Our results show that our load-balancing policy, when combined with the comparator neural network for workload characterization, is effective in exploiting idle resources in a distributed computer system.
Quantifying performance and effects of load carriage during a challenging balancing task using an array of wireless inertial sensors.

PubMed

Cain, Stephen M; McGinnis, Ryan S; Davidson, Steven P; Vitali, Rachel V; Perkins, Noel C; McLean, Scott G

2016-01-01

We utilize an array of wireless inertial measurement units (IMUs) to measure the movements of subjects (n=30) traversing an outdoor balance beam (zigzag and sloping) as quickly as possible both with and without load (20.5kg). Our objectives are: (1) to use IMU array data to calculate metrics that quantify performance (speed and stability) and (2) to investigate the effects of load on performance. We hypothesize that added load significantly decreases subject speed yet results in increased stability of subject movements. We propose and evaluate five performance metrics: (1) time to cross beam (less time=more speed), (2) percentage of total time spent in double support (more double support time=more stable), (3) stride duration (longer stride duration=more stable), (4) ratio of sacrum M-L to A-P acceleration (lower ratio=less lateral balance corrections=more stable), and (5) M-L torso range of motion (smaller range of motion=less balance corrections=more stable). We find that the total time to cross the beam increases with load (t=4.85, p<0.001). Stability metrics also change significantly with load, all indicating increased stability. In particular, double support time increases (t=6.04, p<0.001), stride duration increases (t=3.436, p=0.002), the ratio of sacrum acceleration RMS decreases (t=-5.56, p<0.001), and the M-L torso lean range of motion decreases (t=-2.82, p=0.009). Overall, the IMU array successfully measures subject movement and gait parameters that reveal the trade-off between speed and stability in this highly dynamic balance task. Copyright © 2015 Elsevier B.V. All rights reserved.
Treadmill Exercise with Increased Body Loading Enhances Post Flight Functional Performance

NASA Technical Reports Server (NTRS)

Bloomberg, J. J.; Batson, C. D.; Buxton, R. E.; Feiveson, A. H.; Kofman, I. S.; Laurie, S.; Lee, S. M. C.; Miller, C. A.; Mulavara, A. P.; Peters, B. T.;

2014-01-01

The goals of the Functional Task Test (FTT) study were to determine the effects of space flight on functional tests that are representative of high priority exploration mission tasks and to identify the key underlying physiological factors that contribute to decrements in performance. Ultimately this information will be used to assess performance risks and inform the design of countermeasures for exploration class missions. We have previously shown that for Shuttle, ISS and bed rest subjects functional tasks requiring a greater demand for dynamic control of postural equilibrium (i.e. fall recovery, seat egress/obstacle avoidance during walking, object translation, jump down) showed the greatest decrement in performance. Functional tests with reduced requirements for postural stability (i.e. hatch opening, ladder climb, manual manipulation of objects and tool use) showed little reduction in performance. These changes in functional performance were paralleled by similar decrements in sensorimotor tests designed to specifically assess postural equilibrium and dynamic gait control. The bed rest analog allows us to investigate the impact of axial body unloading in isolation on both functional tasks and on the underlying physiological factors that lead to decrements in performance and then compare them with the results obtained in our space flight study. These results indicate that body support unloading experienced during space flight plays a central role in postflight alteration of functional task performance. Given the importance of body-support loading we set out to determine if there is a relationship between the load experienced during inflight treadmill exercise (produced by a harness and bungee system) and postflight functional performance. ISS crewmembers (n=13) were tested using the FTT protocol before and after 6 months in space. Crewmembers were tested three times before flight, and on 1, 6, and 30 days after landing. To determine how differences in body-support loading experienced during inflight treadmill exercise impacts postflight functional performance, the loading history for each subject during inflight treadmill (T2) exercise was correlated with postflight measures of performance. Crewmembers who walked on the treadmill with higher pull-down loads had less decrement in postflight postural stability and dynamic locomotor control than those subjects who exercised with lighter loads. These data point to the importance of providing significant body loading during inflight treadmill exercise. This and the addition of specific balance training may further mitigate decrements in critical mission tasks that require dynamic postural stability and mobility. Inflight treadmill exercise provides a multi-disciplinary platform to provide sensorimotor, aerobic and bone mechanical stimuli benefits. Forward work will focus on the development of an inflight training system that will integrate aerobic, resistive and balance training modalities into a single interdisciplinary countermeasure system for exploration class missions.

In vitro assessment of retention and resistance failure loads of two preparation designs for maxillary anterior teeth.

PubMed

Bintivanou, Aimilia; Pissiotis, Argirios; Michalakis, Konstantinos

2017-04-01

Parallel labiolingual walls and the preservation of the cingulum in anterior tooth preparations have been advocated. However, their contribution to retention and resistance form has not been evaluated. The purpose of this in vitro study was to evaluate the retention and resistance failure loads of 2 preparation designs for maxillary anterior teeth. Forty metal restorations were fabricated and paired with 40 cobalt-chromium prepared tooth analogs. Twenty of the specimens had parallel buccolingual walls at the cervical part (group PBLW; the control group), whereas the remaining 20 had converging buccolingual walls (group CBLW; the experimental group). The restorations were cemented to the tooth analogs with a resin-modified glass ionomer luting agent. Ten specimens from each group were subjected to tensile loading with a universal testing machine; the rest were subjected to compression loading until failure. Descriptive statistics and the independent t test (α=.05) were used to determine the effect of failure loads in the tested groups. The independent t test revealed statistically significant differences between the tested groups in tensile loading (P<.001) and in compressive loading (P<.001). The PBLW group presented a higher tensile failure load than the CBLW. On the contrary, the PBLW group presented a smaller compression failure load than the CBLW. Parallelism of the buccolingual axial walls in anterior maxillary teeth increased the retention form but decreased the resistance form. Copyright © 2016 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Modeling the Elastic Modulus of 2D Woven CVI SiC Composites

NASA Technical Reports Server (NTRS)

Morscher, Gregory N.

2006-01-01

The use of fiber, interphase, CVI SiC minicomposites as structural elements for 2D-woven SiC fiber reinforced chemically vapor infiltrated (CVI) SiC matrix composites is demonstrated to be a viable approach to model the elastic modulus of these composite systems when tensile loaded in an orthogonal direction. The 0deg (loading direction) and 90deg (perpendicular to loading direction) oriented minicomposites as well as the open porosity and excess SiC associated with CVI SiC composites were all modeled as parallel elements using simple Rule of Mixtures techniques. Excellent agreement for a variety of 2D woven Hi-Nicalon(TradeMark) fiber-reinforced and Sylramic-iBN reinforced CVI SiC matrix composites that differed in numbers of plies, constituent content, thickness, density, and number of woven tows in either direction (i.e, balanced weaves versus unbalanced weaves) was achieved. It was found that elastic modulus was not only dependent on constituent content, but also the degree to which 90deg minicomposites carried load. This depended on the degree of interaction between 90deg and 0deg minicomposites which was quantified to some extent by composite density. The relationships developed here for elastic modulus only necessitated the knowledge of the fractional contents of fiber, interphase and CVI SiC as well as the tow size and shape. It was concluded that such relationships are fairly robust for orthogonally loaded 2D woven CVI SiC composite system and can be implemented by ceramic matrix composite component modelers and designers for modeling the local stiffness in simple or complex parts fabricated with variable constituent contents.

Inducer Hydrodynamic Forces in a Cavitating Environment

NASA Technical Reports Server (NTRS)

Skelley, Stephen E.

2004-01-01

Marshall Space Flight Center has developed and demonstrated a measurement device for sensing and resolving the hydrodynamic loads on fluid machinery. The device - a derivative of the six-component wind tunnel balance - senses the forces and moments on the rotating device through a weakened shaft section instrumented with a series of strain gauges. This rotating balance was designed to directly measure the steady and unsteady hydrodynamic loads on an inducer, thereby defining the amplitude and frequency content associated with operating in various cavitation modes. The rotating balance was calibrated statically using a dead-weight load system in order to generate the 6 x 12 calibration matrix later used to convert measured voltages to engineering units. Structural modeling suggested that the rotating assembly first bending mode would be significantly reduced with the balance s inclusion. This reduction in structural stiffness was later confirmed experimentally with a hammer-impact test. This effect, coupled with the relatively large damping associated with the rotating balance waterproofing material, limited the device s bandwidth to approximately 50 Hertz Other pre-test validations included sensing the test article rotating assembly built-in imbalance for two configurations and directly measuring the assembly mass and buoyancy while submerged under water. Both tests matched predictions and confirmed the device s sensitivity while stationary and rotating. The rotating balance was then demonstrated in a water test of a full-scale Space Shuttle Main Engine high-pressure liquid oxygen pump inducer. Experimental data was collected a scaled operating conditions at three flow coefficients across a range of cavitation numbers for the single inducer geometry and radial clearance. Two distinct cavitation modes were observed symmetric tip vortex cavitation and alternate-blade cavitation. Although previous experimental tests on the same inducer demonstrated two additional cavitation modes at lower inlet pressures, these conditions proved unreachable with the rotating balance installed due to the intense dynamic environment. The sensed radial load was less influenced by flow coefficient than by cavitation number or cavitation mode although the flow coefficient range was relatively narrow. Transition from symmetric tip vortex to alternate-blade cavitation corresponded to changes in both radial load magnitude and radial load orientation relative to the inducer. Sensed moments indicated that the effective load center moved downstream during this change in cavitation mode. An occurrence of "higher+rdex cavitation" was also detected in both the stationary pressures and the rotating balance data although the frequency of the phenomena was well above the reliable bandwidth of the rotating balance. In summary the experimental tests proved both the concept and device s capability despite the limitations and confirmed that hydrodynamically-induced forces and moments develop in response to the unbalanced pressure field, which is, in turn, a product of the cavitation environment.
Towards a Better Distributed Framework for Learning Big Data

DTIC Science & Technology

2017-06-14

UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15
Effects of a dynamic balance training protocol on podalic support in older women. Pilot Study.

PubMed

Battaglia, Giuseppe; Bellafiore, Marianna; Bianco, Antonino; Paoli, Antonio; Palma, Antonio

2010-01-01

The foot provides the only direct contact with supporting surfaces and therefore plays an important role in all postural tasks. Changes in the musculoskeletal and neurological characteristics of the foot with advancing age can alter plantar loading patterns and postural balance. Several studies have reported that exercise training improves postural performance in elderly individuals. The aim of our study was to investigate the effectiveness of a dynamic balance training protocol performed for 5 weeks on the support surface, percentage distribution of load in both feet, and body balance performance in healthy elderly women. Ten subjects (68.67±5.50 yrs old; 28.17±3.35 BMI) were evaluated with a monopodalic performance test and baropodometric analyses before and after the training period. We found a significant improvement in balance unipedal performance times on left and right foot by 20.18% and 26.23% respectively (p<0.05). The support surface of the right foot significantly increased in response to the training protocol and, in particular, in both forefoot and rearfoot regions (p<0.05). In addition, before the training period, load distribution on the left foot was greater than on the right one; equal load redistribution was measured on both feet in response to exercise (p>0.05). The increased support surface and equal redistribution of body weight on both feet obtained in response to our training protocol may be postural adaptations sufficient to improve static balance in elderly women.
Heterogeneous Gossip

NASA Astrophysics Data System (ADS)

Frey, Davide; Guerraoui, Rachid; Kermarrec, Anne-Marie; Koldehofe, Boris; Mogensen, Martin; Monod, Maxime; Quéma, Vivien

Gossip-based information dissemination protocols are considered easy to deploy, scalable and resilient to network dynamics. Load-balancing is inherent in these protocols as the dissemination work is evenly spread among all nodes. Yet, large-scale distributed systems are usually heterogeneous with respect to network capabilities such as bandwidth. In practice, a blind load-balancing strategy might significantly hamper the performance of the gossip dissemination.
A nonrecursive 'Order N' preconditioned conjugate gradient/range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, A. J.; Menon, R.; Sunkel, John

1991-01-01

This paper addresses the requirements of present-day mechanical system simulations of algorithms that induce parallelism on a fine scale and of transient simulation methods which must be automatically load balancing for a wide collection of system topologies and hardware configurations. To this end, a combination range space/preconditioned conjugage gradient formulation of multidegree-of-freedon dynamics is developed, which, by employing regular ordering of the system connectivity graph, makes it possible to derive an extremely efficient preconditioner from the range space metric (as opposed to the system coefficient matrix). Because of the effectiveness of the preconditioner, the method can achieve performance rates that depend linearly on the number of substructures. The method, termed 'Order N' does not require the assembly of system mass or stiffness matrices, and is therefore amenable to implementation on work stations. Using this method, a 13-substructure model of the Space Station was constructed.
A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hager, Robert, E-mail: rhager@pppl.gov; Yoon, E.S., E-mail: yoone@rpi.edu; Ku, S., E-mail: sku@pppl.gov

2016-06-15

Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. In this article, the non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. The finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable onmore » high-performance computing systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. The collision operator's good weak and strong scaling behavior are shown.« less
A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

DOE PAGES

Hager, Robert; Yoon, E. S.; Ku, S.; ...

2016-04-04

Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. The non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. Moreover, the finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable on high-performance computingmore » systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. As a result, the collision operator's good weak and strong scaling behavior are shown.« less
Accelerating Climate Simulations Through Hybrid Computing

NASA Technical Reports Server (NTRS)

Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

2009-01-01

Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
Monte Carlo simulation of biomolecular systems with BIOMCSIM

NASA Astrophysics Data System (ADS)

Kamberaj, H.; Helms, V.

2001-12-01

A new Monte Carlo simulation program, BIOMCSIM, is presented that has been developed in particular to simulate the behaviour of biomolecular systems, leading to insights and understanding of their functions. The computational complexity in Monte Carlo simulations of high density systems, with large molecules like proteins immersed in a solvent medium, or when simulating the dynamics of water molecules in a protein cavity, is enormous. The program presented in this paper seeks to provide these desirable features putting special emphasis on simulations in grand canonical ensembles. It uses different biasing techniques to increase the convergence of simulations, and periodic load balancing in its parallel version, to maximally utilize the available computer power. In periodic systems, the long-ranged electrostatic interactions can be treated by Ewald summation. The program is modularly organized, and implemented using an ANSI C dialect, so as to enhance its modifiability. Its performance is demonstrated in benchmark applications for the proteins BPTI and Cytochrome c Oxidase.
Regularization with numerical extrapolation for finite and UV-divergent multi-loop integrals

NASA Astrophysics Data System (ADS)

de Doncker, E.; Yuasa, F.; Kato, K.; Ishikawa, T.; Kapenga, J.; Olagbemi, O.

2018-03-01

We give numerical integration results for Feynman loop diagrams such as those covered by Laporta (2000) and by Baikov and Chetyrkin (2010), and which may give rise to loop integrals with UV singularities. We explore automatic adaptive integration using multivariate techniques from the PARINT package for multivariate integration, as well as iterated integration with programs from the QUADPACK package, and a trapezoidal method based on a double exponential transformation. PARINT is layered over MPI (Message Passing Interface), and incorporates advanced parallel/distributed techniques including load balancing among processes that may be distributed over a cluster or a network/grid of nodes. Results are included for 2-loop vertex and box diagrams and for sets of 2-, 3- and 4-loop self-energy diagrams with or without UV terms. Numerical regularization of integrals with singular terms is achieved by linear and non-linear extrapolation methods.
Cartesian Off-Body Grid Adaption for Viscous Time- Accurate Flow Simulation

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2011-01-01

An improved solution adaption capability has been implemented in the OVERFLOW overset grid CFD code. Building on the Cartesian off-body approach inherent in OVERFLOW and the original adaptive refinement method developed by Meakin, the new scheme provides for automated creation of multiple levels of finer Cartesian grids. Refinement can be based on the undivided second-difference of the flow solution variables, or on a specific flow quantity such as vorticity. Coupled with load-balancing and an inmemory solution interpolation procedure, the adaption process provides very good performance for time-accurate simulations on parallel compute platforms. A method of using refined, thin body-fitted grids combined with adaption in the off-body grids is presented, which maximizes the part of the domain subject to adaption. Two- and three-dimensional examples are used to illustrate the effectiveness and performance of the adaption scheme.
Balancing Authority Cooperation Concepts to Reduce Variable Generation Integration Costs in the Western Interconnection: Consolidating Balancing Authorities and Sharing Balancing Reserves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samaan, Nader A.; Makarov, Yuri V.; Nguyen, Tony B.

2017-05-07

The study described in this chapter demonstrates the benefits of BA consolidation with the help of a detailed WECC system model and advanced methodology, which is also described in this chapter. The study aims to determine the potential savings in production cost and reduction in balancing reserve requirements in the WECC system. The study has found that effective use of the diversity in load and variable generation over a wide area can indeed help to achieve significant savings. The implementation cost for the consolidation was beyond the scope of this study. The analysis was performed for two different scenarios ofmore » VG penetration: 11% (8% wind and 3% solar) and 33% (24% wind and 9% solar) of WECC projected energy demand in 2020. In analysis of balancing reserves, the objective was to determine the reduction in balancing reserve requirements due to BA consolidation, in terms of required capacity and ramp-rates. Hour-ahead and 10-minute ahead forecast errors for load, wind, and solar were simulated. In addition, 1-minute resolution load, wind and solar data were used to derive balancing reserve requirements i.e. load-following and regulation requirements for each individual BA and for the consolidated BA (CBA). The reduction in balancing reserves was determined by calculating the difference between total reserve requirements that need to be carried by different BAs if they operate individually, and reserve requirements that need to be carried by the CBA. The study results show that the consolidated WECC system would have about a 50% overall reduction in balancing reserves for the 11% penetration scenario and a 65% reduction for the 33% penetration scenario in comparison with total reserve requirements that need to be carried by different BAs if they operate individually.« less
Photogrammetric Deflection Measurements for the Tiltrotor Test Rig (TTR) Multi-Component Rotor Balance Calibration

NASA Technical Reports Server (NTRS)

Solis, Eduardo; Meyn, Larry

2016-01-01

Calibrating the internal, multi-component balance mounted in the Tiltrotor Test Rig (TTR) required photogrammetric measurements to determine the location and orientation of forces applied to the balance. The TTR, with the balance and calibration hardware attached, was mounted in a custom calibration stand. Calibration loads were applied using eleven hydraulic actuators, operating in tension only, that were attached to the forward frame of the calibration stand and the TTR calibration hardware via linkages with in-line load cells. Before the linkages were installed, photogrammetry was used to determine the location of the linkage attachment points on the forward frame and on the TTR calibration hardware. Photogrammetric measurements were used to determine the displacement of the linkage attachment points on the TTR due to deflection of the hardware under applied loads. These measurements represent the first photogrammetric deflection measurements to be made to support 6-component rotor balance calibration. This paper describes the design of the TTR and the calibration hardware, and presents the development, set-up and use of the photogrammetry system, along with some selected measurement results.
Engineered Fibrin Gels for Parallel Stimulation of Mesenchymal Stem Cell Proangiogenic and Osteogenic Potential

PubMed Central

Murphy, Kaitlin C.; Hughbanks, Marissa L.; Binder, Bernard Y.K.; Vissers, Caroline B.; Leach, J. Kent

2014-01-01

Mesenchymal stem/stromal cells (MSCs) are under examination for use in cell therapies to repair bone defects resulting from trauma or disease. MSCs secrete proangiogenic cues and can be induced to differentiate into bone-forming osteoblasts, yet there is limited evidence that these events can be achieved in parallel. Manipulation of the cell delivery vehicle properties represents a candidate approach for directing MSC function in bone healing. We hypothesized that the biophysical properties of a fibrin gel could simultaneously regulate the proangiogenic and osteogenic potential of entrapped MSCs. Fibrin gels were formed by supplementation with NaCl (1.2, 2.3, and 3.9% w/v) to modulate gel biophysical properties without altering protein concentrations. MSCs entrapped in 1.2% w/v NaCl gels were the most proangiogenic in vitro, yet cells in 3.9% w/v gels exhibited the greatest osteogenic response. Compared to the other groups, MSCs entrapped in 2.3% w/v gels provided the best balance between proangiogenic potential, osteogenic potential, and gel contractility. The contribution of MSCs to bone repair was then examined when deployed in 2.3% w/v NaCl gels and implanted into an irradiated orthotopic bone defect. Compared to acellular gels after 3 weeks of implantation, defects treated with MSC-loaded fibrin gels exhibited significant increases in vessel density, early osteogenesis, superior morphology, and increased cellularity of repair tissue. Defects treated with MSC-loaded gels exhibited increased bone formation after 12 weeks compared to blank gels. These results confirm that fibrin gel properties can be modulated to simultaneously promote both the proangiogenic and osteogenic potential of MSCs, and fibrin gels modified by supplementation with NaCl are promising carriers for MSCs to stimulate bone repair in vivo. PMID:25527322
Waiting can be an optimal conservation strategy, even in a crisis discipline.

PubMed

Iacona, Gwenllian D; Possingham, Hugh P; Bode, Michael

2017-09-26

Biodiversity conservation projects confront immediate and escalating threats with limited funding. Conservation theory suggests that the best response to the species extinction crisis is to spend money as soon as it becomes available, and this is often an explicit constraint placed on funding. We use a general dynamic model of a conservation landscape to show that this decision to "front-load" project spending can be suboptimal if a delay allows managers to use resources more strategically. Our model demonstrates the existence of temporal efficiencies in conservation management, which parallel the spatial efficiencies identified by systematic conservation planning. The optimal timing of decisions balances the rate of biodiversity decline (e.g., the relaxation of extinction debts, or the progress of climate change) against the rate at which spending appreciates in value (e.g., through interest, learning, or capacity building). We contrast the benefits of acting and waiting in two ecosystems where restoration can mitigate forest bird extinction debts: South Australia's Mount Lofty Ranges and Paraguay's Atlantic Forest. In both cases, conservation outcomes cannot be maximized by front-loading spending, and the optimal solution recommends substantial delays before managers undertake conservation actions. Surprisingly, these delays allow superior conservation benefits to be achieved, in less time than front-loading. Our analyses provide an intuitive and mechanistic rationale for strategic delay, which contrasts with the orthodoxy of front-loaded spending for conservation actions. Our results illustrate the conservation efficiencies that could be achieved if decision makers choose when to spend their limited resources, as opposed to just where to spend them.
Impact of Cognitive Loading on Postural Control in Parkinson’s Disease With Freezing of Gait

PubMed Central

Buated, Wannipat; Lolekha, Praween; Hidaka, Shohei; Fujinami, Tsutomu

2016-01-01

Objective:To assess standing balance in Parkinson’s disease (PD) patients with and without freezing of gait (FOG) during cognitive loading. Method:A balance assessment with cognitive loading, reading (RE) and counting backward (CB), was performed by the Nintendo Wii Fit in 60 PD patients (Hoehn and Yahr stages 1-3) at Thammasat University Hospital, Thailand. The participants were grouped into FOG and non-FOG according to the Freezing of Gait–Questionnaire (FOG-Q) scores. The center of pressure (CoP) in terms of path length (PL), sway area (SA), root mean square (RMS), medio-lateral (ML), and antero-posterior (AP) were analyzed. Results:Significant increases of PL were observed in both groups of PD patients during cognitive loading (p < .001). Meanwhile, the increased differences of PL during cognitive loading in PD-FOG were larger than in PD-non-FOG. The ML displacement during counting backward was significantly increased in PD-FOG (p = .012). Conclusion:Cognitive loading influenced standing balance and postural sway of PD patients. The effects were more prominent in PD-FOG. These findings represent the interactions between cognitive function, postural control, and FOG in PD. PMID:28680941
A Targeted Approach to Ligament Balancing Using Kinetic Sensors.

PubMed

Gustke, Kenneth A; Golladay, Gregory J; Roche, Martin W; Elson, Leah C; Anderson, Christopher R

2017-07-01

Currently, soft-tissue imbalance contributes to several of the foremost reasons for revision following primary TKA, including instability, stiffness, and aseptic loosening. In order to decrease the incidence of soft-tissue imbalance, intraoperative sensors were developed to provide real-time, quantitative load data within the knee. This study examines the intraoperative data of a group of multicenter patients to determine how targeted ligament releases affect intra-articular loading, and to understand which types of releases are necessary to achieve quantified ligament balance. A group of 129 patients received sensor-assisted TKA, as part of a multicenter study. Medial and lateral loading data were collected pre-release, during any sequential releases, and post-release. All data were collected at 10°, 45°, and 90° during range of motion testing. Ligament release type, release technique type, and resultant loading were collected. Loading across the joint decreased, overall, and became more symmetrical after releases were performed. On average, between 2 and 3 corrections were made (up to 8) in order to achieve ligament balance. The ligament release type and subsequent quantified change in loading were in agreement with historical, qualified sources. Objective data from sensor output may assist surgeons in decreasing loading variability and, thereby, decreasing ligament imbalance and its associated complications. Copyright © 2017 Elsevier Inc. All rights reserved.
CFTLB: a novel cross-layer fault tolerant and load balancing protocol for WMN

NASA Astrophysics Data System (ADS)

Krishnaveni, N. N.; Chitra, K.

2017-12-01

Wireless mesh network (WMN) forms a wireless backbone framework for multi-hop transmission among the routers and clients in the extensible coverage area. To improve the throughput of WMNs with multiple gateways (GWs), several issues related to GW selection, load balancing and frequent link failures due to the presence of dynamic obstacles and channel interference should be addressed. This paper presents a novel cross-layer fault tolerant and load balancing (CFTLB) protocol to overcome the issues in WMN. Initially, the neighbour GW is searched and channel load is calculated. The GW having least channel load is selected which is estimated during the arrival of the new node. The proposed algorithm finds the alternate GWs and calculates the channel availability under high loading scenarios. If the current load in the GW is high, another GW is found and channel availability is calculated. Besides, it initiates the channel switching and establishes the communication with the mesh client effectively. The utilisation of hashing technique in proposed CFTLB verifies the status of the packets and achieves better performance in terms of router average throughput, throughput, average channel access time and lower end-to-end delay, communication overhead and average data loss in the channel compared to the existing protocols.
Single-Vector Calibration of Wind-Tunnel Force Balances

NASA Technical Reports Server (NTRS)

Parker, P. A.; DeLoach, R.

2003-01-01

An improved method of calibrating a wind-tunnel force balance involves the use of a unique load application system integrated with formal experimental design methodology. The Single-Vector Force Balance Calibration System (SVS) overcomes the productivity and accuracy limitations of prior calibration methods. A force balance is a complex structural spring element instrumented with strain gauges for measuring three orthogonal components of aerodynamic force (normal, axial, and side force) and three orthogonal components of aerodynamic torque (rolling, pitching, and yawing moments). Force balances remain as the state-of-the-art instrument that provide these measurements on a scale model of an aircraft during wind tunnel testing. Ideally, each electrical channel of the balance would respond only to its respective component of load, and it would have no response to other components of load. This is not entirely possible even though balance designs are optimized to minimize these undesirable interaction effects. Ultimately, a calibration experiment is performed to obtain the necessary data to generate a mathematical model and determine the force measurement accuracy. In order to set the independent variables of applied load for the calibration 24 NASA Tech Briefs, October 2003 experiment, a high-precision mechanical system is required. Manual deadweight systems have been in use at Langley Research Center (LaRC) since the 1940s. These simple methodologies produce high confidence results, but the process is mechanically complex and labor-intensive, requiring three to four weeks to complete. Over the past decade, automated balance calibration systems have been developed. In general, these systems were designed to automate the tedious manual calibration process resulting in an even more complex system which deteriorates load application quality. The current calibration approach relies on a one-factor-at-a-time (OFAT) methodology, where each independent variable is incremented individually throughout its full-scale range, while all other variables are held at a constant magnitude. This OFAT approach has been widely accepted because of its inherent simplicity and intuitive appeal to the balance engineer. LaRC has been conducting research in a "modern design of experiments" (MDOE) approach to force balance calibration. Formal experimental design techniques provide an integrated view to the entire calibration process covering all three major aspects of an experiment; the design of the experiment, the execution of the experiment, and the statistical analyses of the data. In order to overcome the weaknesses in the available mechanical systems and to apply formal experimental techniques, a new mechanical system was required. The SVS enables the complete calibration of a six-component force balance with a series of single force vectors.
Maintaining Balance: The Increasing Role of Energy Storage for Renewable Integration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stenclik, Derek; Denholm, Paul; Chalamala, Babu

For nearly a century, global power systems have focused on three key functions: to generate, transmit, and distribute electricity as a real-time commodity. Physics requires that electricity generation always be in real-time balance with load, despite variability in load on timescales ranging from sub-second disturbances to multi-year trends. With the increasing role of variable generation from wind and solar, retirements of fossil fuel-based generation, and a changing consumer demand profile, grid operators are using new methods to maintain this balance.

Maintaining Balance: The Increasing Role of Energy Storage for Renewable Integration

DOE PAGES

Stenclik, Derek; Denholm, Paul; Chalamala, Babu

2017-10-17

For nearly a century, global power systems have focused on three key functions: to generate, transmit, and distribute electricity as a real-time commodity. Physics requires that electricity generation always be in real-time balance with load, despite variability in load on timescales ranging from sub-second disturbances to multi-year trends. With the increasing role of variable generation from wind and solar, retirements of fossil fuel-based generation, and a changing consumer demand profile, grid operators are using new methods to maintain this balance.
Running accuracy analysis of a 3-RRR parallel kinematic machine considering the deformations of the links

NASA Astrophysics Data System (ADS)

Wang, Liping; Jiang, Yao; Li, Tiemin

2014-09-01

Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
A network flow model for load balancing in circuit-switched multicomputers

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.

1990-01-01

In multicomputers that utilize circuit switching or wormhole routing, communication overhead depends largely on link contention - the variation due to distance between nodes is negligible. This has a major impact on the load balancing problem. In this case, there are some nodes with excess load (sources) and others with deficit load (sinks) and it is required to find a matching of sources to sinks that avoids contention. The problem is made complex by the hardwired routing on currently available machines: the user can control only which nodes communicate but not how the messages are routed. Network flow models of message flow in the mesh and the hypercube were developed to solve this problem. The crucial property of these models is the correspondence between minimum cost flows and correctly routed messages. To solve a given load balancing problem, a minimum cost flow algorithm is applied to the network. This permits one to determine efficiently a maximum contention free matching of sources to sinks which, in turn, tells one how much of the given imbalance can be eliminated without contention.
Selective randomized load balancing and mesh networks with changing demands

NASA Astrophysics Data System (ADS)

Shepherd, F. B.; Winzer, P. J.

2006-05-01

We consider the problem of building cost-effective networks that are robust to dynamic changes in demand patterns. We compare several architectures using demand-oblivious routing strategies. Traditional approaches include single-hop architectures based on a (static or dynamic) circuit-switched core infrastructure and multihop (packet-switched) architectures based on point-to-point circuits in the core. To address demand uncertainty, we seek minimum cost networks that can carry the class of hose demand matrices. Apart from shortest-path routing, Valiant's randomized load balancing (RLB), and virtual private network (VPN) tree routing, we propose a third, highly attractive approach: selective randomized load balancing (SRLB). This is a blend of dual-hop hub routing and randomized load balancing that combines the advantages of both architectures in terms of network cost, delay, and delay jitter. In particular, we give empirical analyses for the cost (in terms of transport and switching equipment) for the discussed architectures, based on three representative carrier networks. Of these three networks, SRLB maintains the resilience properties of RLB while achieving significant cost reduction over all other architectures, including RLB and multihop Internet protocol/multiprotocol label switching (IP/MPLS) networks using VPN-tree routing.
Regression Analysis and Calibration Recommendations for the Characterization of Balance Temperature Effects

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2018-01-01

Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
LDMOS Channel Thermometer Based on a Thermal Resistance Sensor for Balancing Temperature in Monolithic Power ICs.

PubMed

Lin, Tingyou; Ho, Yingchieh; Su, Chauchin

2017-06-15

This paper presents a method of thermal balancing for monolithic power integrated circuits (ICs). An on-chip temperature monitoring sensor that consists of a poly resistor strip in each of multiple parallel MOSFET banks is developed. A temperature-to-frequency converter (TFC) is proposed to quantize on-chip temperature. A pulse-width-modulation (PWM) methodology is developed to balance the channel temperature based on the quantization. The modulated PWM pulses control the hottest of metal-oxide-semiconductor field-effect transistor (MOSFET) bank to reduce its power dissipation and heat generation. A test chip with eight parallel MOSFET banks is fabricated in TSMC 0.25 μm HV BCD processes, and total area is 900 × 914 μm². The maximal temperature variation among the eight banks can reduce to 2.8 °C by the proposed thermal balancing system from 9.5 °C with 1.5 W dissipation. As a result, our proposed system improves the lifetime of a power MOSFET by 20%.
LDMOS Channel Thermometer Based on a Thermal Resistance Sensor for Balancing Temperature in Monolithic Power ICs

PubMed Central

Lin, Tingyou; Ho, Yingchieh; Su, Chauchin

2017-01-01

This paper presents a method of thermal balancing for monolithic power integrated circuits (ICs). An on-chip temperature monitoring sensor that consists of a poly resistor strip in each of multiple parallel MOSFET banks is developed. A temperature-to-frequency converter (TFC) is proposed to quantize on-chip temperature. A pulse-width-modulation (PWM) methodology is developed to balance the channel temperature based on the quantization. The modulated PWM pulses control the hottest of metal-oxide-semiconductor field-effect transistor (MOSFET) bank to reduce its power dissipation and heat generation. A test chip with eight parallel MOSFET banks is fabricated in TSMC 0.25 μm HV BCD processes, and total area is 900 × 914 μm2. The maximal temperature variation among the eight banks can reduce to 2.8 °C by the proposed thermal balancing system from 9.5 °C with 1.5 W dissipation. As a result, our proposed system improves the lifetime of a power MOSFET by 20%. PMID:28617346
Scalable Performance Measurement and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamblin, Todd

2009-01-01

Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines. Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number ofmore » tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming. In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.« less
Scalable, High-performance 3D Imaging Software Platform: System Architecture and Application to Virtual Colonoscopy

PubMed Central

Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin

2013-01-01

One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Monitoring dynamic loads on wind tunnel force balances

NASA Technical Reports Server (NTRS)

Ferris, Alice T.; White, William C.

1989-01-01

Two devices have been developed at NASA Langley to monitor the dynamic loads incurred during wind-tunnel testing. The Balance Dynamic Display Unit (BDDU), displays and monitors the combined static and dynamic forces and moments in the orthogonal axes. The Balance Critical Point Analyzer scales and sums each normalized signal from the BDDU to obtain combined dynamic and static signals that represent the dynamic loads at predefined high-stress points. The display of each instrument is a multiplex of six analog signals in a way that each channel is displayed sequentially as one-sixth of the horizontal axis on a single oscilloscope trace. Thus this display format permits the operator to quickly and easily monitor the combined static and dynamic level of up to six channels at the same time.
Selenium mass balance in the Great Salt Lake, Utah

USGS Publications Warehouse

Diaz, X.; Johnson, W.P.; Naftz, D.L.

2009-01-01

A mass balance for Se in the south arm of the Great Salt Lake was developed for September 2006 to August 2007 of monitoring for Se loads and removal flows. The combined removal flows (sedimentation and volatilization) totaled to a geometric mean value of 2079??kg Se/yr, with the estimated low value being 1255??kg Se/yr, and an estimated high value of 3143??kg Se/yr at the 68% confidence level. The total (particulates + dissolved) loads (via runoff) were about 1560??kg Se/yr, for which the error is expected to be ?? 15% for the measured loads. Comparison of volatilization to sedimentation flux demonstrates that volatilization rather than sedimentation is likely the major mechanism of selenium removal from the Great Salt Lake. The measured loss flows balance (within the range of uncertainties), and possibly surpass, the measured annual loads. Concentration histories were modeled using a simple mass balance, which indicated that no significant change in Se concentration was expected during the period of study. Surprisingly, the measured total Se concentration increased during the period of the study, indicating that the removal processes operate at their low estimated rates, and/or there are unmeasured selenium loads entering the lake. The selenium concentration trajectories were compared to those of other trace metals to assess the significance of selenium concentration trends. ?? 2008 Elsevier B.V.
A comparison of parallel and diverging screw angles in the stability of locked plate constructs.

PubMed

Wähnert, D; Windolf, M; Brianza, S; Rothstock, S; Radtke, R; Brighenti, V; Schwieger, K

2011-09-01

We investigated the static and cyclical strength of parallel and angulated locking plate screws using rigid polyurethane foam (0.32 g/cm(3)) and bovine cancellous bone blocks. Custom-made stainless steel plates with two conically threaded screw holes with different angulations (parallel, 10° and 20° divergent) and 5 mm self-tapping locking screws underwent pull-out and cyclical pull and bending tests. The bovine cancellous blocks were only subjected to static pull-out testing. We also performed finite element analysis for the static pull-out test of the parallel and 20° configurations. In both the foam model and the bovine cancellous bone we found the significantly highest pull-out force for the parallel constructs. In the finite element analysis there was a 47% more damage in the 20° divergent constructs than in the parallel configuration. Under cyclical loading, the mean number of cycles to failure was significantly higher for the parallel group, followed by the 10° and 20° divergent configurations. In our laboratory setting we clearly showed the biomechanical disadvantage of a diverging locking screw angle under static and cyclical loading.
Microchannel cross load array with dense parallel input

DOEpatents

Swierkowski, Stefan P.

2004-04-06

An architecture or layout for microchannel arrays using T or Cross (+) loading for electrophoresis or other injection and separation chemistry that are performed in microfluidic configurations. This architecture enables a very dense layout of arrays of functionally identical shaped channels and it also solves the problem of simultaneously enabling efficient parallel shapes and biasing of the input wells, waste wells, and bias wells at the input end of the separation columns. One T load architecture uses circular holes with common rows, but not columns, which allows the flow paths for each channel to be identical in shape, using multiple mirror image pieces. Another T load architecture enables the access hole array to be formed on a biaxial, collinear grid suitable for EDM micromachining (square holes), with common rows and columns.
A Bayesian approach to infer nitrogen loading rates from crop and land-use types surrounding private wells in the Central Valley, California

NASA Astrophysics Data System (ADS)

Ransom, Katherine M.; Bell, Andrew M.; Barber, Quinn E.; Kourakos, George; Harter, Thomas

2018-05-01

This study is focused on nitrogen loading from a wide variety of crop and land-use types in the Central Valley, California, USA, an intensively farmed region with high agricultural crop diversity. Nitrogen loading rates for several crop types have been measured based on field-scale experiments, and recent research has calculated nitrogen loading rates for crops throughout the Central Valley based on a mass balance approach. However, research is lacking to infer nitrogen loading rates for the broad diversity of crop and land-use types directly from groundwater nitrate measurements. Relating groundwater nitrate measurements to specific crops must account for the uncertainty about and multiplicity in contributing crops (and other land uses) to individual well measurements, and for the variability of nitrogen loading within farms and from farm to farm for the same crop type. In this study, we developed a Bayesian regression model that allowed us to estimate land-use-specific groundwater nitrogen loading rate probability distributions for 15 crop and land-use groups based on a database of recent nitrate measurements from 2149 private wells in the Central Valley. The water and natural, rice, and alfalfa and pasture groups had the lowest median estimated nitrogen loading rates, each with a median estimate below 5 kg N ha-1 yr-1. Confined animal feeding operations (dairies) and citrus and subtropical crops had the greatest median estimated nitrogen loading rates at approximately 269 and 65 kg N ha-1 yr-1, respectively. In general, our probability-based estimates compare favorably with previous direct measurements and with mass-balance-based estimates of nitrogen loading. Nitrogen mass-balance-based estimates are larger than our groundwater nitrate derived estimates for manured and nonmanured forage, nuts, cotton, tree fruit, and rice crops. These discrepancies are thought to be due to groundwater age mixing, dilution from infiltrating river water, or denitrification between the time when nitrogen leaves the root zone (point of reference for mass-balance-derived loading) and the time and location of groundwater measurement.
Wind Energy Management System Integration Project Incorporating Wind Generation and Load Forecast Uncertainties into Power Grid Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Makarov, Yuri V.; Huang, Zhenyu; Etingov, Pavel V.

2010-09-01

The power system balancing process, which includes the scheduling, real time dispatch (load following) and regulation processes, is traditionally based on deterministic models. Since the conventional generation needs time to be committed and dispatched to a desired megawatt level, the scheduling and load following processes use load and wind power production forecasts to achieve future balance between the conventional generation and energy storage on the one side, and system load, intermittent resources (such as wind and solar generation) and scheduled interchange on the other side. Although in real life the forecasting procedures imply some uncertainty around the load and windmore » forecasts (caused by forecast errors), only their mean values are actually used in the generation dispatch and commitment procedures. Since the actual load and intermittent generation can deviate from their forecasts, it becomes increasingly unclear (especially, with the increasing penetration of renewable resources) whether the system would be actually able to meet the conventional generation requirements within the look-ahead horizon, what the additional balancing efforts would be needed as we get closer to the real time, and what additional costs would be incurred by those needs. In order to improve the system control performance characteristics, maintain system reliability, and minimize expenses related to the system balancing functions, it becomes necessary to incorporate the predicted uncertainty ranges into the scheduling, load following, and, in some extent, into the regulation processes. It is also important to address the uncertainty problem comprehensively, by including all sources of uncertainty (load, intermittent generation, generators’ forced outages, etc.) into consideration. All aspects of uncertainty such as the imbalance size (which is the same as capacity needed to mitigate the imbalance) and generation ramping requirement must be taken into account. The latter unique features make this work a significant step forward toward the objective of incorporating of wind, solar, load, and other uncertainties into power system operations. In this report, a new methodology to predict the uncertainty ranges for the required balancing capacity, ramping capability and ramp duration is presented. Uncertainties created by system load forecast errors, wind and solar forecast errors, generation forced outages are taken into account. The uncertainty ranges are evaluated for different confidence levels of having the actual generation requirements within the corresponding limits. The methodology helps to identify system balancing reserve requirement based on a desired system performance levels, identify system “breaking points”, where the generation system becomes unable to follow the generation requirement curve with the user-specified probability level, and determine the time remaining to these potential events. The approach includes three stages: statistical and actual data acquisition, statistical analysis of retrospective information, and prediction of future grid balancing requirements for specified time horizons and confidence intervals. Assessment of the capacity and ramping requirements is performed using a specially developed probabilistic algorithm based on a histogram analysis incorporating all sources of uncertainty and parameters of a continuous (wind forecast and load forecast errors) and discrete (forced generator outages and failures to start up) nature. Preliminary simulations using California Independent System Operator (California ISO) real life data have shown the effectiveness of the proposed approach. A tool developed based on the new methodology described in this report will be integrated with the California ISO systems. Contractual work is currently in place to integrate the tool with the AREVA EMS system.« less
Control Allocation with Load Balancing

NASA Technical Reports Server (NTRS)

Bodson, Marc; Frost, Susan A.

2009-01-01

Next generation aircraft with a large number of actuators will require advanced control allocation methods to compute the actuator commands needed to follow desired trajectories while respecting system constraints. Previously, algorithms were proposed to minimize the l1 or l2 norms of the tracking error and of the actuator deflections. The paper discusses the alternative choice of the l(infinity) norm, or sup norm. Minimization of the control effort translates into the minimization of the maximum actuator deflection (min-max optimization). The paper shows how the problem can be solved effectively by converting it into a linear program and solving it using a simplex algorithm. Properties of the algorithm are also investigated through examples. In particular, the min-max criterion results in a type of load balancing, where the load is th desired command and the algorithm balances this load among various actuators. The solution using the l(infinity) norm also results in better robustness to failures and to lower sensitivity to nonlinearities in illustrative examples.
Effect of synthetic surfaces and vegetation in urban areas on human energy balance and comfort

Treesearch

Thomas F. Stark; David R. Miller

1977-01-01

The thermal balance of a standard man was quantified for a variety of urban and rural summer daytime microclimates. The resulting net heat-load data were correlated with the relative amounts of vegetation and synthetic materials at each site. By extrapolating these results, it is possible to estimate the expected heat load of a proposed development before it is built...
Uncertainty analysis on simple mass balance model to calculate critical loads for soil acidity

Treesearch

Harbin Li; Steven G. McNulty

2007-01-01

Simple mass balance equations (SMBE) of critical acid loads (CAL) in forest soil were developed to assess potential risks of air pollutants to ecosystems. However, to apply SMBE reliably at large scales, SMBE must be tested for adequacy and uncertainty. Our goal was to provide a detailed analysis of uncertainty in SMBE so that sound strategies for scaling up CAL...
A Planar Quasi-Static Constraint Mode Tire Model

DTIC Science & Technology

2015-07-10

strikes a balance between simple tire models that lack the fidelity to make accurate chassis load predictions and computationally intensive models that...strikes a balance between heuristic tire models (such as a linear point-follower) that lack the fidelity to make accurate chassis load predictions...UNCLASSIFIED: Distribution Statement A. Cleared for public release A PLANAR QUASI-STATIC CONSTRAINT MODE TIRE MODEL Rui Maa John B. Ferris

Application of model predictive control for optimal operation of wind turbines

NASA Astrophysics Data System (ADS)

Yuan, Yuan; Cao, Pei; Tang, J.

2017-04-01

For large-scale wind turbines, reducing maintenance cost is a major challenge. Model predictive control (MPC) is a promising approach to deal with multiple conflicting objectives using the weighed sum approach. In this research, model predictive control method is applied to wind turbine to find an optimal balance between multiple objectives, such as the energy capture, loads on turbine components, and the pitch actuator usage. The actuator constraints are integrated into the objective function at the control design stage. The analysis is carried out in both the partial load region and full load region, and the performances are compared with those of a baseline gain scheduling PID controller. The application of this strategy achieves enhanced balance of component loads, the average power and actuator usages in partial load region.
clubber: removing the bioinformatics bottleneck in big data analyses.

PubMed

Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana

2017-06-13

With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these "big data" analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber's goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.
clubber: removing the bioinformatics bottleneck in big data analyses

PubMed Central

Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana

2018-01-01

With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment. PMID:28609295
Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

NASA Astrophysics Data System (ADS)

Gassmöller, Rene; Bangerth, Wolfgang

2016-04-01

Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
Graphics applications utilizing parallel processing

NASA Technical Reports Server (NTRS)

Rice, John R.

1990-01-01

The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
The Plasma Simulation Code: A modern particle-in-cell code with patch-based load-balancing

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Fox, William; Abbott, Stephen; Ahmadi, Narges; Maynard, Kristofor; Wang, Liang; Ruhl, Hartmut; Bhattacharjee, Amitava

2016-08-01

This work describes the Plasma Simulation Code (PSC), an explicit, electromagnetic particle-in-cell code with support for different order particle shape functions. We review the basic components of the particle-in-cell method as well as the computational architecture of the PSC code that allows support for modular algorithms and data structure in the code. We then describe and analyze in detail a distinguishing feature of PSC: patch-based load balancing using space-filling curves which is shown to lead to major efficiency gains over unbalanced methods and a previously used simpler balancing method.
Ketorolac Administration Attenuates Retinal Ganglion Cell Death After Axonal Injury.

PubMed

Nadal-Nicolás, Francisco M; Rodriguez-Villagra, Esther; Bravo-Osuna, Irene; Sobrado-Calvo, Paloma; Molina-Martínez, Irene; Villegas-Pérez, Maria Paz; Vidal-Sanz, Manuel; Agudo-Barriuso, Marta; Herrero-Vanrell, Rocío

2016-03-01

To assess the neuroprotective effects of ketorolac administration, in solution or delivered from biodegradable microspheres, on the survival of axotomized retinal ganglion cells (RGCs). Retinas were treated intravitreally with a single injection of tromethamine ketorolac solution and/or with ketorolac-loaded poly(D,L-lactide-co-glycolide) (PLGA) microspheres. Ketorolac treatments were administered either 1 week before optic nerve crush (pre-ONC) or right after the ONC (simultaneous). In all cases, animals were euthanized 7 days after the ONC. As control, nonloaded microspheres or vehicle (balanced salt solution, BSS) were administered in parallel groups. All retinas were dissected as flat mounts; RGCs were immunodetected with brain-specific homeobox/POU domain protein 3A (Brn3a), and their number was automatically quantified. The percentage of Brn3a+RGCs was 36% to 41% in all control groups (ONC with or without BSS or nonloaded microparticles). Ketorolac solution administered pre-ONC resulted in 63% survival of RGCs, while simultaneous administration promoted a 53% survival. Ketorolac-loaded microspheres were not as efficient as ketorolac solution (43% and 42% of RGC survival pre-ONC or simultaneous, respectively). The combination of ketorolac solution and ketorolac-loaded microspheres did not have an additive effect (54% and 55% survival pre-ONC and simultaneous delivery, respectively). Treatment with the nonsteroidal anti-inflammatory drug ketorolac delays RGC death triggered by a traumatic axonal insult. Pretreatment seems to elicit a better output than simultaneous administration of ketorolac solution. This may be taken into account when performing procedures resulting in RGC axonal injury.
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-12-31

Advances in computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable to or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generatedmore » in a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of the MCNP code on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electronphoton transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load-balancing schemes for homogeneous and heterogeneous networks.« less
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-02-01

Advances of computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources, and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generated inmore » a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of MCNP on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electron/photon transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load balancing schemes for homogeneous and heterogeneous networks.« less
Heat loads on poloidal and toroidal edges of castellated plasma-facing components in COMPASS

NASA Astrophysics Data System (ADS)

Dejarnac, R.; Corre, Y.; Vondracek, P.; Gaspar, J.; Gauthier, E.; Gunn, J. P.; Komm, M.; Gardarein, J.-L.; Horacek, J.; Hron, M.; Matejicek, J.; Pitts, R. A.; Panek, R.

2018-06-01

Dedicated experiments have been performed in the COMPASS tokamak to thoroughly study the power deposition processes occurring on poloidal and toroidal edges of castellated plasma-facing components in tokamaks during steady-state L-mode conditions. Surface temperatures measured by a high resolution infra-red camera are compared with reconstructed synthetic data from a 2D thermal model using heat flux profiles derived from both the optical approximation and 2D particle-in-cell (PIC) simulations. In the case of poloidal leading edges, when the contribution from local radiation is taken into account, the parallel heat flux deduced from unperturbed, upstream measurements is fully consistent with the observed temperature increase at the leading edges of various heights, respecting power balance assuming simple projection of the parallel flux density. Smoothing of the heat flux deposition profile due to finite ion Larmor radius predicted by the PIC simulations is found to be weak and the power deposition on misaligned poloidal edges is better described by the optical approximation. This is consistent with an electron-dominated regime associated with a non-ambipolar parallel current flow. In the case of toroidal gap edges, the different contributions of the total incoming flux along the gap have been observed experimentally for the first time. They confirm the results of recent numerical studies performed for ITER showing that in specific cases the heat deposition does not necessarily follow the optical approximation. Indeed, ions can spiral onto the magnetically shadowed toroidal edge. Particle-in-cell simulations emphasize again the role played by local non-ambipolarity in the deposition pattern.
Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism

DOE PAGES

Slattery, Stuart R.

2015-12-02

In this study we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothnessmore » and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. Finally, these scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation.« less
Biomechanical comparison of orthogonal versus parallel double plating systems in intraarticular distal humerus fractures.

PubMed

Atalar, Ata C; Tunalı, Onur; Erşen, Ali; Kapıcıoğlu, Mehmet; Sağlam, Yavuz; Demirhan, Mehmet S

2017-01-01

In intraarticular distal humerus fractures, internal fixation with double plates is the gold standard treatment. However the optimal plate configuration is not clear in the literature. The aim of this study was to compare the biomechanical stability of the parallel and the orthogonal anatomical locking plating systems in intraarticular distal humerus fractures in artificial humerus models. Intraarticular distal humerus fracture (AO13-C2) with 5 mm metaphyseal defect was created in sixteen artificial humeral models. Models were fixed with either orthogonal or parallel plating systems with locking screws (Acumed elbow plating systems). Both systems were tested for their stiffness with loads in axial compression, varus, valgus, anterior and posterior bending. Then plastic deformation after cyclic loading in posterior bending and load to failure in posterior bending were tested. The failure mechanisms of all the samples were observed. Stiffness values in every direction were not significantly different among the orthogonal and the parallel plating groups. There was no statistical difference between the two groups in plastic deformation values (0.31 mm-0.29 mm) and load to failure tests in posterior bending (372.4 N-379.7 N). In the orthogonal plating system most of the failures occurred due to the proximal shaft fracture, whereas in the parallel plating system failure occurred due to the shift of the most distal screw in proximal fragment. Our study showed that both plating systems had similar biomechanical stabilities when anatomic plates with distal locking screws were used in intraarticular distal humerus fractures in artificial humerus models. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
NERC Policy 10: Measurement of two generation and load balancing IOS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Spicer, P.J.; Galow, G.G.

1999-11-01

Policy 10 will describe specific standards and metrics for most of the reliability functions described in the Interconnected Operations Services Working Group (IOS WG) report. The purpose of this paper is to discuss, in detail, the proposed metrics for two generation and load balancing IOSs: Regulation; Load Following. For purposes of this paper, metrics include both measurement and performance evaluation. The measurement methods discussed are included in the current draft of the proposed Policy 10. The performance evaluation method discussed is offered by the authors for consideration by the IOS ITF (Implementation Task Force) for inclusion into Policy 10.
Development of parallel algorithms for electrical power management in space applications

NASA Technical Reports Server (NTRS)

Berry, Frederick C.

1989-01-01

The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Scalable Parallel Algorithms for Multidimensional Digital Signal Processing

DTIC Science & Technology

1991-12-31

Proceedings, San Diego CL., August 1989, pp. 132-146. 53 [13] A. L. Gorin, L. Auslander, and A. Silberger . Balanced computation of 2D trans- forms on a tree...Speech, Signal Processing. ASSP-34, Oct. 1986,pp. 1301-1309. [24] A. Norton and A. Silberger . Parallelization and performance analysis of the Cooley-Tukey
Advanced Electric Distribution, Switching, and Conversion Technology for Power Control

NASA Technical Reports Server (NTRS)

Soltis, James V.

1998-01-01

The Electrical Power Control Unit currently under development by Sundstrand Aerospace for use on the Fluids Combustion Facility of the International Space Station is the precursor of modular power distribution and conversion concepts for future spacecraft and aircraft applications. This unit combines modular current-limiting flexible remote power controllers and paralleled power converters into one package. Each unit includes three 1-kW, current-limiting power converter modules designed for a variable-ratio load sharing capability. The flexible remote power controllers can be used in parallel to match load requirements and can be programmed for an initial ON or OFF state on powerup. The unit contains an integral cold plate. The modularity and hybridization of the Electrical Power Control Unit sets the course for future spacecraft electrical power systems, both large and small. In such systems, the basic hybridized converter and flexible remote power controller building blocks could be configured to match power distribution and conversion capabilities to load requirements. In addition, the flexible remote power controllers could be configured in assemblies to feed multiple individual loads and could be used in parallel to meet the specific current requirements of each of those loads. Ultimately, the Electrical Power Control Unit design concept could evolve to a common switch module hybrid, or family of hybrids, for both converter and switchgear applications. By assembling hybrids of a common current rating and voltage class in parallel, researchers could readily adapt these units for multiple applications. The Electrical Power Control Unit concept has the potential to be scaled to larger and smaller ratings for both small and large spacecraft and for aircraft where high-power density, remote power controllers or power converters are required and a common replacement part is desired for multiples of a base current rating.
Transport aircraft loading and balancing system: Using a CLIPS expert system for military aircraft load planning

NASA Technical Reports Server (NTRS)

Richardson, J.; Labbe, M.; Belala, Y.; Leduc, Vincent

1994-01-01

The requirement for improving aircraft utilization and responsiveness in airlift operations has been recognized for quite some time by the Canadian Forces. To date, the utilization of scarce airlift resources has been planned mainly through the employment of manpower-intensive manual methods in combination with the expertise of highly qualified personnel. In this paper, we address the problem of facilitating the load planning process for military aircraft cargo planes through the development of a computer-based system. We introduce TALBAS (Transport Aircraft Loading and BAlancing System), a knowledge-based system designed to assist personnel involved in preparing valid load plans for the C130 Hercules aircraft. The main features of this system which are accessible through a convivial graphical user interface, consists of the automatic generation of valid cargo arrangements given a list of items to be transported, the user-definition of load plans and the automatic validation of such load plans.
Balancing exploration, uncertainty and computational demands in many objective reservoir optimization

NASA Astrophysics Data System (ADS)

Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea

2017-11-01

Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
Exploration of a Permanent Magnet Synchronous Generator with Compensated Reactance Windings in Parallel Rod Configuration

NASA Astrophysics Data System (ADS)

Lyan, Oleg; Jankunas, Valdas; Guseinoviene, Eleonora; Pašilis, Aleksas; Senulis, Audrius; Knolis, Audrius; Kurt, Erol

2018-02-01

In this study, a permanent magnet synchronous generator (PMSG) topology with compensated reactance windings in parallel rod configuration is proposed to reduce the armature reactance X L and to achieve higher efficiency of PMSG. The PMSG was designed using iron-cored bifilar coil topology to overcome problems of market-dominant rotary type generators. Often the problem is a comparatively high armature reactance X L, which is usually bigger than armature resistance R a. Therefore, the topology is proposed to partially compensate or negligibly reduce the PMSG reactance. The study was performed by using finite element method (FEM) analysis and experimental investigation. FEM analysis was used to investigate magnetic field flux distribution and density in PMSG. The PMSG experimental analyses of no-load losses and electromotive force versus frequency (i.e., speed) was performed. Also terminal voltage, power output and efficiency relation with load current at different frequencies have been evaluated. The reactance of PMSG has low value and a linear relation with operating frequency. The low reactance gives a small variation of efficiency (from 90% to 95%) in a wide range of load (from 3 A to 10 A) and operation frequency (from 44 Hz to 114 Hz). The comparison of PMSG characteristics with parallel and series winding connection showed insignificant power variation. The research results showed that compensated reactance winding in parallel rod configuration in PMSG design provides lower reactance and therefore, higher efficiency under wider load and frequency variation.

Towards optimizing server performance in an educational MMORPG for teaching computer programming

NASA Astrophysics Data System (ADS)

Malliarakis, Christos; Satratzemi, Maya; Xinogalos, Stelios

2013-10-01

Web-based games have become significantly popular during the last few years. This is due to the gradual increase of internet speed, which has led to the ongoing multiplayer games development and more importantly the emergence of the Massive Multiplayer Online Role Playing Games (MMORPG) field. In parallel, similar technologies called educational games have started to be developed in order to be put into practice in various educational contexts, resulting in the field of Game Based Learning. However, these technologies require significant amounts of resources, such as bandwidth, RAM and CPU capacity etc. These amounts may be even larger in an educational MMORPG game that supports computer programming education, due to the usual inclusion of a compiler and the constant client/server data transmissions that occur during program coding, possibly leading to technical issues that could cause malfunctions during learning. Thus, the determination of the elements that affect the overall games resources' load is essential so that server administrators can configure them and ensure educational games' proper operation during computer programming education. In this paper, we propose a new methodology with which we can achieve monitoring and optimization of the load balancing, so that the essential resources for the creation and proper execution of an educational MMORPG for computer programming can be foreseen and bestowed without overloading the system.
Muscle activation timing and balance response in chronic lower back pain patients with associated radiculopathy.

PubMed

Frost, Lydia R; Brown, Stephen H M

2016-02-01

Patients with chronic low back pain and associated radiculopathy present with neuromuscular symptoms both in their lower back and down their leg; however, investigations of muscle activation have so far been isolated to the lower back. During balance perturbations, it is necessary that lower limb muscles activate with proper timing and sequencing along with the lower back musculature to efficiently regain balance control. Patients with chronic low back pain and radiculopathy and matched controls completed a series of balance perturbations (rapid bilateral arm raise, unanticipated and anticipated sudden loading, and rapid rise to toe). Muscle activation timing and sequencing as well as kinetic response to the perturbations were analyzed. Patients had significantly delayed lower limb muscle activation in rapid arm raise trials as compared to controls. In sudden loading trials, muscle activation timing was not delayed in patients; however, some differences in posterior chain muscle activation sequencing were present. Patients demonstrated less anterior-posterior movement in unanticipated sudden loading trials, and greater medial-lateral movement in rise to toe trials. Patients with low back pain and radiculopathy demonstrated some significant differences from control participants in terms of muscle activation timing, sequencing, and overall balance control. The presence of differences between patients and controls, specifically in the lower limb, indicates that radiculopathy may play a role in altering balance control in these patients. Copyright © 2015 Elsevier Ltd. All rights reserved.
High-Temperature (1000 F) Magnetic Thrust Bearing Test Rig Completed and Operational

NASA Technical Reports Server (NTRS)

Montague, Gerald T.

2005-01-01

Large axial loads are induced on the rolling element bearings of a gas turbine. To extend bearing life, designers use pneumatic balance pistons to reduce the axial load on the bearings. A magnetic thrust bearing could replace the balance pistons to further reduce the axial load. To investigate this option, the U.S. Army Research Laboratory, the NASA Glenn Research Center, and Texas A&M University designed and fabricated a 7-in.- diameter magnetic thrust bearing to operate at 1000 F and 30,000 rpm, with a 1000-lb load capacity. This research was funded through a NASA Space Technology Transfer Act with Allison Advance Development Company under the Ultra-Efficient Engine Technology (UEET) Intelligent Propulsion Systems Foundation Technology project.
A sampling and classification item selection approach with content balancing.

PubMed

Chen, Pei-Hua

2015-03-01

Existing automated test assembly methods typically employ constrained combinatorial optimization. Constructing forms sequentially based on an optimization approach usually results in unparallel forms and requires heuristic modifications. Methods based on a random search approach have the major advantage of producing parallel forms sequentially without further adjustment. This study incorporated a flexible content-balancing element into the statistical perspective item selection method of the cell-only method (Chen et al. in Educational and Psychological Measurement, 72(6), 933-953, 2012). The new method was compared with a sequential interitem distance weighted deviation model (IID WDM) (Swanson & Stocking in Applied Psychological Measurement, 17(2), 151-166, 1993), a simultaneous IID WDM, and a big-shadow-test mixed integer programming (BST MIP) method to construct multiple parallel forms based on matching a reference form item-by-item. The results showed that the cell-only method with content balancing and the sequential and simultaneous versions of IID WDM yielded results comparable to those obtained using the BST MIP method. The cell-only method with content balancing is computationally less intensive than the sequential and simultaneous versions of IID WDM.
Development of a planar-type high sensitivity metallic contaminant detector

NASA Astrophysics Data System (ADS)

Okabe, Shunsuke; Sasada, Ichiro

2017-05-01

Metallic contaminant detectors based on the balanced coil system are widely used in the food industry. In the balanced coil system, an excitation coil and two identical pickup coils are used in a way that the magnetic coupling of pickup coils to the excitation coil is cancelled with each other when no metallic contaminants present. In a conventional system, the excitation coil and the pickup coil are planar and are parallel, therefore the magnetic coupling is strong even if there is no metallic contaminant. Such strong magnetic coupling makes balancing procedure tedious. In this paper, we introduce a new coil system in which pickup coils are set orthogonal to the excitation coil, making the magnetic coupling much small compared to conventional counterpart. Pickup coils are equipped with thin magnetic cores and placed inside the excitation coil being parallel to the excitation coil plane. The balancing method consists of two steps; the one is geometrical and the other is digital processing including down conversion. Experiments are carried out to show the detection capability of ferromagnetic contaminants and non-magnetic contaminants.
Inverse Force Determination on a Small Scale Launch Vehicle Model Using a Dynamic Balance

NASA Technical Reports Server (NTRS)

Ngo, Christina L.; Powell, Jessica M.; Ross, James C.

2017-01-01

A launch vehicle can experience large unsteady aerodynamic forces in the transonic regime that, while usually only lasting for tens of seconds during launch, could be devastating if structural components and electronic hardware are not designed to account for them. These aerodynamic loads are difficult to experimentally measure and even harder to computationally estimate. The current method for estimating buffet loads is through the use of a few hundred unsteady pressure transducers and wind tunnel test. Even with a large number of point measurements, the computed integrated load is not an accurate enough representation of the total load caused by buffeting. This paper discusses an attempt at using a dynamic balance to experimentally determine buffet loads on a generic scale hammer head launch vehicle model tested at NASA Ames Research Center's 11' x 11' transonic wind tunnel. To use a dynamic balance, the structural characteristics of the model needed to be identified so that the natural modal response could be and removed from the aerodynamic forces. A finite element model was created on a simplified version of the model to evaluate the natural modes of the balance flexures, assist in model design, and to compare to experimental data. Several modal tests were conducted on the model in two different configurations to check for non-linearity, and to estimate the dynamic characteristics of the model. The experimental results were used in an inverse force determination technique with a psuedo inverse frequency response function. Due to the non linearity, the model not being axisymmetric, and inconsistent data between the two shake tests from different mounting configuration, it was difficult to create a frequency response matrix that satisfied all input and output conditions for wind tunnel configuration to accurately predict unsteady aerodynamic loads.
A Single-Vector Force Calibration Method Featuring the Modern Design of Experiments

NASA Technical Reports Server (NTRS)

Parker, P. A.; Morton, M.; Draper, N.; Line, W.

2001-01-01

This paper proposes a new concept in force balance calibration. An overview of the state-of-the-art in force balance calibration is provided with emphasis on both the load application system and the experimental design philosophy. Limitations of current systems are detailed in the areas of data quality and productivity. A unique calibration loading system integrated with formal experimental design techniques has been developed and designated as the Single-Vector Balance Calibration System (SVS). This new concept addresses the limitations of current systems. The development of a quadratic and cubic calibration design is presented. Results from experimental testing are compared and contrasted with conventional calibration systems. Analyses of data are provided that demonstrate the feasibility of this concept and provide new insights into balance calibration.
A design procedure for the phase-controlled parallel-loaded resonant inverter

NASA Technical Reports Server (NTRS)

King, Roger J.

1989-01-01

High-frequency-link power conversion and distribution based on a resonant inverter (RI) has been recently proposed. The design of several topologies is reviewed, and a simple approximate design procedure is developed for the phase-controlled parallel-loaded RI. This design procedure seeks to ensure the benefits of resonant conversion and is verified by data from a laboratory 2.5 kVA, 20-kHz converter. A simple phasor analysis is introduced as a useful approximation for design purposes. The load is considered to be a linear impedance (or an ac current sink). The design procedure is verified using a 2.5-kVA 20-kHz RI. Also obtained are predictable worst-case ratings for each component of the resonant tank circuit and the inverter switches. For a given load VA requirement, below-resonance operation is found to result in a significantly lower tank VA requirement. Under transient conditions such as load short-circuit, a reversal of the expected commutation sequence is possible.
Transport in the plateau regime in a tokamak pedestal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seol, J.; Shaing, K. C.

In a tokamak H-mode, a strong E Multiplication-Sign B flow shear is generated during the L-H transition. Turbulence in a pedestal is suppressed significantly by this E Multiplication-Sign B flow shear. In this case, neoclassical transport may become important. The neoclassical fluxes are calculated in the plateau regime with the parallel plasma flow using their kinetic definitions. In an axisymmetric tokamak, the neoclassical particles fluxes can be decomposed into the banana-plateau flux and the Pfirsch-Schlueter flux. The banana-plateau particle flux is driven by the parallel viscous force and the Pfirsch-Schlueter flux by the poloidal variation of the friction force. Themore » combined quantity of the radial electric field and the parallel flow is determined by the flux surface averaged parallel momentum balance equation rather than requiring the ambipolarity of the total particle fluxes. In this process, the Pfirsch-Schlueter flux does not appear in the flux surface averaged parallel momentum equation. Only the banana-plateau flux is used to determine the parallel flow in the form of the flux surface averaged parallel viscosity. The heat flux, obtained using the solution of the parallel momentum balance equation, decreases exponentially in the presence of sonic M{sub p} without any enhancement over that in the standard neoclassical theory. Here, M{sub p} is a combination of the poloidal E Multiplication-Sign B flow and the parallel mass flow. The neoclassical bootstrap current in the plateau regime is presented. It indicates that the neoclassical bootstrap current also is related only to the banana-plateau fluxes. Finally, transport fluxes are calculated when M{sub p} is large enough to make the parallel electron viscosity comparable with the parallel ion viscosity. It is found that the bootstrap current has a finite value regardless of the magnitude of M{sub p}.« less
Ramping and Uncertainty Prediction Tool - Analysis and Visualization of Wind Generation Impact on Electrical Grid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Etingov, Pavel; Makarov, PNNL Yuri; Subbarao, PNNL Kris

RUT software is designed for use by the Balancing Authorities to predict and display additional requirements caused by the variability and uncertainty in load and generation. The prediction is made for the next operating hours as well as for the next day. The tool predicts possible deficiencies in generation capability and ramping capability. This deficiency of balancing resources can cause serious risks to power system stability and also impact real-time market energy prices. The tool dynamically and adaptively correlates changing system conditions with the additional balancing needs triggered by the interplay between forecasted and actual load and output of variablemore » resources. The assessment is performed using a specially developed probabilistic algorithm incorporating multiple sources of uncertainty including wind, solar and load forecast errors. The tool evaluates required generation for a worst case scenario, with a user-specified confidence level.« less
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

NASA Astrophysics Data System (ADS)

Hohl, A.; Delmelle, E. M.; Tang, W.

2015-07-01

Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Development of the NTF-117S Semi-Span Balance

NASA Technical Reports Server (NTRS)

Lynn, Keith C.

2010-01-01

A new high-capacity semi-span force and moment balance has recently been developed for use at the National Transonic Facility at the NASA Langley Research Center. This new semi-span balance provides the NTF a new measurement capability that will support testing of semi-span test models at transonic high-lift testing regimes. Future testing utilizing this new balance capability will include active circulation control and propulsion simulation testing of semi-span transonic wing models. The NTF has recently implemented a new highpressure air delivery station that will provide both high and low mass flow pressure lines that are routed out to the semi-span models via a set high/low pressure bellows that are indirectly linked to the metric end of the NTF-117S balance. A new check-load stand is currently being developed to provide the NTF with an in-house capability that will allow for performing check-loads on the NTF-117S balance in order to determine the pressure tare affects on the overall performance of the balance. An experimental design is being developed that will allow for experimentally assessing the static pressure tare affects on the balance performance.
Load Balancing in Cloud Computing Environment Using Improved Weighted Round Robin Algorithm for Nonpreemptive Dependent Tasks.

PubMed

Devi, D Chitra; Uthariaraj, V Rhymend

2016-01-01

Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Load Balancing in Cloud Computing Environment Using Improved Weighted Round Robin Algorithm for Nonpreemptive Dependent Tasks

PubMed Central

Devi, D. Chitra; Uthariaraj, V. Rhymend

2016-01-01

Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods. PMID:26955656
Biomechanical Comparison of Parallel and Crossed Suture Repair for Longitudinal Meniscus Tears.

PubMed

Milchteim, Charles; Branch, Eric A; Maughon, Ty; Hughey, Jay; Anz, Adam W

2016-04-01

Longitudinal meniscus tears are commonly encountered in clinical practice. Meniscus repair devices have been previously tested and presented; however, prior studies have not evaluated repair construct designs head to head. This study compared a new-generation meniscus repair device, SpeedCinch, with a similar established device, Fast-Fix 360, and a parallel repair construct to a crossed construct. Both devices utilize self-adjusting No. 2-0 ultra-high molecular weight polyethylene (UHMWPE) and 2 polyether ether ketone (PEEK) anchors. Crossed suture repair constructs have higher failure loads and stiffness compared with simple parallel constructs. The newer repair device would exhibit similar performance to an established device. Controlled laboratory study. Sutures were placed in an open fashion into the body and posterior horn regions of the medial and lateral menisci in 16 cadaveric knees. Evaluation of 2 repair devices and 2 repair constructs created 4 groups: 2 parallel vertical sutures created with the Fast-Fix 360 (2PFF), 2 crossed vertical sutures created with the Fast-Fix 360 (2XFF), 2 parallel vertical sutures created with the SpeedCinch (2PSC), and 2 crossed vertical sutures created with the SpeedCinch (2XSC). After open placement of the repair construct, each meniscus was explanted and tested to failure on a uniaxial material testing machine. All data were checked for normality of distribution, and 1-way analysis of variance by ranks was chosen to evaluate for statistical significance of maximum failure load and stiffness between groups. Statistical significance was defined as P < .05. The mean maximum failure loads ± 95% CI (range) were 89.6 ± 16.3 N (125.7-47.8 N) (2PFF), 72.1 ± 11.7 N (103.4-47.6 N) (2XFF), 71.9 ± 15.5 N (109.4-41.3 N) (2PSC), and 79.5 ± 25.4 N (119.1-30.9 N) (2XSC). Interconstruct comparison revealed no statistical difference between all 4 constructs regarding maximum failure loads (P = .49). Stiffness values were also similar, with no statistical difference on comparison (P = .28). Both devices in the current study had similar failure load and stiffness when 2 vertical or 2 crossed sutures were tested in cadaveric human menisci. Simple parallel vertical sutures perform similarly to crossed suture patterns at the time of implantation.
Effects of Deployment on Musculoskeletal and Physiological Characteristics and Balance.

PubMed

Nagai, Takashi; Abt, John P; Sell, Timothy C; Keenan, Karen A; McGrail, Mark A; Smalley, Brian W; Lephart, Scott M

2016-09-01

Despite many nonbattle injuries reported during deployment, few studies have been conducted to evaluate the effects of deployment on musculoskeletal and physiological characteristics and balance. A total of 35 active duty U.S. Army Soldiers participated in laboratory testing before and after deployment to Afghanistan. The following measures were obtained for each Soldier: shoulder, trunk, hip, knee, and ankle strength and range of motion (ROM), balance, body composition, aerobic capacity, and anaerobic power/capacity. Additionally, Soldiers were asked about their physical activity and load carriage. Paired t tests or Wilcoxon tests with an α = 0.05 set a priori were used for statistical analyses. Shoulder external rotation ROM, torso rotation ROM, ankle dorsiflexion ROM, torso rotation strength, and anaerobic power significantly increased following deployment (p < 0.05). Shoulder extension ROM, shoulder external rotation strength, and eyes-closed balance (p < 0.05) were significantly worse following deployment. The majority of Soldiers (85%) engaged in physical activity. In addition, 58% of Soldiers reported regularly carrying a load (22 kg average). The deployment-related changes in musculoskeletal and physiological characteristics and balance as well as physical activity and load carriage during deployment may assist with proper preparation with the intent to optimize tactical readiness and mitigate injury risk. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Integrated configurable equipment selection and line balancing for mass production with serial-parallel machining systems

NASA Astrophysics Data System (ADS)

Battaïa, Olga; Dolgui, Alexandre; Guschinsky, Nikolai; Levin, Genrikh

2014-10-01

Solving equipment selection and line balancing problems together allows better line configurations to be reached and avoids local optimal solutions. This article considers jointly these two decision problems for mass production lines with serial-parallel workplaces. This study was motivated by the design of production lines based on machines with rotary or mobile tables. Nevertheless, the results are more general and can be applied to assembly and production lines with similar structures. The designers' objectives and the constraints are studied in order to suggest a relevant mathematical model and an efficient optimization approach to solve it. A real case study is used to validate the model and the developed approach.
Large-deflection theory for end compression of long rectangular plates rigidly clamped along two edges

NASA Technical Reports Server (NTRS)

Levy, Samuel; Krupen, Philip

1943-01-01

The von Karman equations for flat plates are solved beyond the buckling load up to edge strains equal to eight time the buckling strain, for the extreme case of rigid clamping along the edges parallel to the load. Deflections, bending stresses, and membrane stresses are given as a function of end compressive load. The theoretical values of effective width are compared with the values derived for simple support along the edges parallel to the load. The increases in effective width due to rigid clamping drops from about 20 percent near the buckling strain to about 8 percent at an edge strain equal to eight times the buckling strain. Experimental values of effective width in the elastic range reported in NACA Technical Note No. 684 are between the theoretical curves for the extremes of simple support and rigid clamping.
Characterization and Modeling of a Control Moment Gyroscope

DTIC Science & Technology

2015-03-26

parallel, and angular directions [16]. The rotor is powered by a brushless DC motor rated to 557.9 mN-m (4.938 in-lbf) [4]. The motor has Hall effect ...mass balance installed on rotor housing Gimbal Balancing Test Procedures. To evaluate the effectiveness of the mass balance, the gimbal was tested...in which the rotor is running The vehicle-level model test (Section 4.9) predicts the effects of CMG gear lash on overall vehicle performance. Gear
A compact submicrosecond, high current generator

NASA Astrophysics Data System (ADS)

Kovalchuk, B. M.; Kharlov, A. V.; Zorin, V. B.; Zherlitsyn, A. A.

2009-08-01

Pulsed current generator was developed for experiments with current carrying pulsed plasma. Main parts of the generator are capacitor bank, low inductive current driving lines, and central load part. Generator consists of four identical sections, connected in parallel to one load. Capacitor bank is assembled from 24 capacitor blocks (100 kV, 80 nF), connected in parallel. It stores 9.6 kJ at 100 kV charging voltage. Each capacitor block incorporates a multigap spark switch, which is able to commute by six parallel channels. Switches operate in dry air at atmospheric pressure. The generator was tested with an inductive load and a liner load. At 17.5 nH inductive load and 100 kV of charging voltage it provides 650 kA of current amplitude with 390 ns rise time with 0.6 Ω damping resistors in discharge circuit of each capacitor block. The net generator inductance without a load was optimized to be as low as 15 nH, which results in extremely low impedance of the generator (˜0.08 Ω). It ensures effective energy coupling with a low impedance load such as Z pinch. The generator operates reliably without any adjustments in 70-100 kV range of charging voltage. Jitter in delay between output pulse and triggering pulse is less than 5 ns at 70-100 kV charging voltage. Operation and handling are very simple, because no oil or purified gases are required for the generator. The generator has dimensions 5.24×1.2×0.18 m3 and total weight about 1400 kg, thus manifesting itself as simple, robust, and cost effective apparatus.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.