A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains
NASA Astrophysics Data System (ADS)
Lopes, I. A. Rodrigues; Pires, F. M. Andrade; Reis, F. J. P.
2018-02-01
A mixed parallel strategy for the solution of homogenization-based multi-scale constitutive problems undergoing finite strains is proposed. The approach aims to reduce the computational time and memory requirements of non-linear coupled simulations that use finite element discretization at both scales (FE^2). In the first level of the algorithm, a non-conforming domain decomposition technique, based on the FETI method combined with a mortar discretization at the interface of macroscopic subdomains, is employed. A master-slave scheme, which distributes tasks by macroscopic element and adopts dynamic scheduling, is then used for each macroscopic subdomain composing the second level of the algorithm. This strategy allows the parallelization of FE^2 simulations in computers with either shared memory or distributed memory architectures. The proposed strategy preserves the quadratic rates of asymptotic convergence that characterize the Newton-Raphson scheme. Several examples are presented to demonstrate the robustness and efficiency of the proposed parallel strategy.
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
2013-01-01
Based Micropolar Single Crystal Plasticity: Comparison of Multi - and Single Criterion Theories. J. Mech. Phys. Solids 2011, 59, 398–422. ALE3D ...element boundaries in a multi -step constitutive evaluation (Becker, 2011). The results showed the desired effects of smoothing the deformation field...Implementation The model was implemented in the large-scale parallel, explicit finite element code ALE3D (2012). The crystal plasticity
A Dynamic Finite Element Method for Simulating the Physics of Faults Systems
NASA Astrophysics Data System (ADS)
Saez, E.; Mora, P.; Gross, L.; Weatherley, D.
2004-12-01
We introduce a dynamic Finite Element method using a novel high level scripting language to describe the physical equations, boundary conditions and time integration scheme. The library we use is the parallel Finley library: a finite element kernel library, designed for solving large-scale problems. It is incorporated as a differential equation solver into a more general library called escript, based on the scripting language Python. This library has been developed to facilitate the rapid development of 3D parallel codes, and is optimised for the Australian Computational Earth Systems Simulator Major National Research Facility (ACcESS MNRF) supercomputer, a 208 processor SGI Altix with a peak performance of 1.1 TFlops. Using the scripting approach we obtain a parallel FE code able to take advantage of the computational efficiency of the Altix 3700. We consider faults as material discontinuities (the displacement, velocity, and acceleration fields are discontinuous at the fault), with elastic behavior. The stress continuity at the fault is achieved naturally through the expression of the fault interactions in the weak formulation. The elasticity problem is solved explicitly in time, using the Saint Verlat scheme. Finally, we specify a suitable frictional constitutive relation and numerical scheme to simulate fault behaviour. Our model is based on previous work on modelling fault friction and multi-fault systems using lattice solid-like models. We adapt the 2D model for simulating the dynamics of parallel fault systems described to the Finite-Element method. The approach uses a frictional relation along faults that is slip and slip-rate dependent, and the numerical integration approach introduced by Mora and Place in the lattice solid model. In order to illustrate the new Finite Element model, single and multi-fault simulation examples are presented.
Development of the US3D Code for Advanced Compressible and Reacting Flow Simulations
NASA Technical Reports Server (NTRS)
Candler, Graham V.; Johnson, Heath B.; Nompelis, Ioannis; Subbareddy, Pramod K.; Drayna, Travis W.; Gidzak, Vladimyr; Barnhardt, Michael D.
2015-01-01
Aerothermodynamics and hypersonic flows involve complex multi-disciplinary physics, including finite-rate gas-phase kinetics, finite-rate internal energy relaxation, gas-surface interactions with finite-rate oxidation and sublimation, transition to turbulence, large-scale unsteadiness, shock-boundary layer interactions, fluid-structure interactions, and thermal protection system ablation and thermal response. Many of the flows have a large range of length and time scales, requiring large computational grids, implicit time integration, and large solution run times. The University of Minnesota NASA US3D code was designed for the simulation of these complex, highly-coupled flows. It has many of the features of the well-established DPLR code, but uses unstructured grids and has many advanced numerical capabilities and physical models for multi-physics problems. The main capabilities of the code are described, the physical modeling approaches are discussed, the different types of numerical flux functions and time integration approaches are outlined, and the parallelization strategy is overviewed. Comparisons between US3D and the NASA DPLR code are presented, and several advanced simulations are presented to illustrate some of novel features of the code.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
OWL: A scalable Monte Carlo simulation suite for finite-temperature study of materials
NASA Astrophysics Data System (ADS)
Li, Ying Wai; Yuk, Simuck F.; Cooper, Valentino R.; Eisenbach, Markus; Odbadrakh, Khorgolkhuu
The OWL suite is a simulation package for performing large-scale Monte Carlo simulations. Its object-oriented, modular design enables it to interface with various external packages for energy evaluations. It is therefore applicable to study the finite-temperature properties for a wide range of systems: from simple classical spin models to materials where the energy is evaluated by ab initio methods. This scheme not only allows for the study of thermodynamic properties based on first-principles statistical mechanics, it also provides a means for massive, multi-level parallelism to fully exploit the capacity of modern heterogeneous computer architectures. We will demonstrate how improved strong and weak scaling is achieved by employing novel, parallel and scalable Monte Carlo algorithms, as well as the applications of OWL to a few selected frontier materials research problems. This research was supported by the Office of Science of the Department of Energy under contract DE-AC05-00OR22725.
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Hui; Shi, Yanjun
A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
Array-based Hierarchical Mesh Generation in Parallel
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2015-11-03
In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less
Alvioli, M.; Baum, R.L.
2016-01-01
We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
Simulation of Hypervelocity Impact on Aluminum-Nextel-Kevlar Orbital Debris Shields
NASA Technical Reports Server (NTRS)
Fahrenthold, Eric P.
2000-01-01
An improved hybrid particle-finite element method has been developed for hypervelocity impact simulation. The method combines the general contact-impact capabilities of particle codes with the true Lagrangian kinematics of large strain finite element formulations. Unlike some alternative schemes which couple Lagrangian finite element models with smooth particle hydrodynamics, the present formulation makes no use of slidelines or penalty forces. The method has been implemented in a parallel, three dimensional computer code. Simulations of three dimensional orbital debris impact problems using this parallel hybrid particle-finite element code, show good agreement with experiment and good speedup in parallel computation. The simulations included single and multi-plate shields as well as aluminum and composite shielding materials. at an impact velocity of eleven kilometers per second.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Norman, Matthew R
2014-01-01
The novel ADER-DT time discretization is applied to two-dimensional transport in a quadrature-free, WENO- and FCT-limited, Finite-Volume context. Emphasis is placed on (1) the serial and parallel computational properties of ADER-DT and this framework and (2) the flexibility of ADER-DT and this framework in efficiently balancing accuracy with other constraints important to transport applications. This study demonstrates a range of choices for the user when approaching their specific application while maintaining good parallel properties. In this method, genuine multi-dimensionality, single-step and single-stage time stepping, strict positivity, and a flexible range of limiting are all achieved with only one parallel synchronizationmore » and data exchange per time step. In terms of parallel data transfers per simulated time interval, this improves upon multi-stage time stepping and post-hoc filtering techniques such as hyperdiffusion. This method is evaluated with standard transport test cases over a range of limiting options to demonstrate quantitatively and qualitatively what a user should expect when employing this method in their application.« less
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.
1991-01-01
A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Watson, Willie R. (Technical Monitor)
2005-01-01
The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Vectorial finite elements for solving the radiative transfer equation
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.
2018-06-01
The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
NASA Astrophysics Data System (ADS)
Weiss, Chester J.
2013-08-01
An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Multi-scale Methods in Quantum Field Theory
NASA Astrophysics Data System (ADS)
Polyzou, W. N.; Michlin, Tracie; Bulut, Fatih
2018-05-01
Daubechies wavelets are used to make an exact multi-scale decomposition of quantum fields. For reactions that involve a finite energy that take place in a finite volume, the number of relevant quantum mechanical degrees of freedom is finite. The wavelet decomposition has natural resolution and volume truncations that can be used to isolate the relevant degrees of freedom. The application of flow equation methods to construct effective theories that decouple coarse and fine scale degrees of freedom is examined.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-02-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Technical Reports Server (NTRS)
Kamat, Manohar P.; Watson, Brian C.
1992-01-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
NASA Astrophysics Data System (ADS)
Asgari, Somayyeh; Granpayeh, Nosrat
2017-06-01
Two parallel graphene sheet waveguides and a graphene cylindrical resonator between them is proposed, analyzed, and simulated numerically by using the finite-difference time-domain method. One end of each graphene waveguide is the input and output port. The resonance and the prominent mid-infrared band-pass filtering effect are achieved. The transmittance spectrum is tuned by varying the radius of the graphene cylindrical resonator, the dielectric inside it, and also the chemical potential of graphene utilizing gate voltage. Simulation results are in good agreement with theoretical calculations. As an application, a multi/demultiplexer is proposed and analyzed. Our studies demonstrate that graphene based ultra-compact, nano-scale devices can be designed for optical processing and photonic integrated devices.
NASA Astrophysics Data System (ADS)
Maneva, Y. G.; Araneda, J. A.; Poedts, S.
2014-12-01
We consider parametric instabilities of finite-amplitude large-scale Alfven waves in a low-beta collisionless multi-species plasma, consisting of fluid electrons, kinetic protons and a drifting population of minor ions. Complementary to many theoretical studies, relying on fluid or multi-fluid approach, in this work we present the solutions of the parametric instability dispersion relation, including kinetic effects in the parallel direction, along the ambient magnetic field. This provides us with the opportunity to predict the importance of some wave-particle interactions like Landau damping of the daughter ion-acoustic waves for the given pump wave and plasma conditions. We apply the dispersion relation to plasma parameters, typical for low-beta collisionless solar wind close to the Sun. We compare the analytical solutions to the linear stage of hybrid numerical simulations and discuss the application of the model to the problems of preferential heating and differential acceleration of minor ions in the solar corona and the fast solar wind. The results of this study provide tools for prediction and interpretation of the magnetic field and particles data as expected from the future Solar Orbiter and Solar Probe Plus missions.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug; Johnson, Keith
1989-01-01
The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Li; He, Ya-Ling; Kang, Qinjun
2013-12-15
A coupled (hybrid) simulation strategy spatially combining the finite volume method (FVM) and the lattice Boltzmann method (LBM), called CFVLBM, is developed to simulate coupled multi-scale multi-physicochemical processes. In the CFVLBM, computational domain of multi-scale problems is divided into two sub-domains, i.e., an open, free fluid region and a region filled with porous materials. The FVM and LBM are used for these two regions, respectively, with information exchanged at the interface between the two sub-domains. A general reconstruction operator (RO) is proposed to derive the distribution functions in the LBM from the corresponding macro scalar, the governing equation of whichmore » obeys the convection–diffusion equation. The CFVLBM and the RO are validated in several typical physicochemical problems and then are applied to simulate complex multi-scale coupled fluid flow, heat transfer, mass transport, and chemical reaction in a wall-coated micro reactor. The maximum ratio of the grid size between the FVM and LBM regions is explored and discussed. -- Highlights: •A coupled simulation strategy for simulating multi-scale phenomena is developed. •Finite volume method and lattice Boltzmann method are coupled. •A reconstruction operator is derived to transfer information at the sub-domains interface. •Coupled multi-scale multiple physicochemical processes in micro reactor are simulated. •Techniques to save computational resources and improve the efficiency are discussed.« less
Methods for High-Order Multi-Scale and Stochastic Problems Analysis, Algorithms, and Applications
2016-10-17
finite volume schemes, discontinuous Galerkin finite element method, and related methods, for solving computational fluid dynamics (CFD) problems and...approximation for finite element methods. (3) The development of methods of simulation and analysis for the study of large scale stochastic systems of...laws, finite element method, Bernstein-Bezier finite elements , weakly interacting particle systems, accelerated Monte Carlo, stochastic networks 16
Processor farming in two-level analysis of historical bridge
NASA Astrophysics Data System (ADS)
Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.
2017-11-01
This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2016-08-18
In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less
NASA Astrophysics Data System (ADS)
Casadei, F.; Ruzzene, M.
2011-04-01
This work illustrates the possibility to extend the field of application of the Multi-Scale Finite Element Method (MsFEM) to structural mechanics problems that involve localized geometrical discontinuities like cracks or notches. The main idea is to construct finite elements with an arbitrary number of edge nodes that describe the actual geometry of the damage with shape functions that are defined as local solutions of the differential operator of the specific problem according to the MsFEM approach. The small scale information are then brought to the large scale model through the coupling of the global system matrices that are assembled using classical finite element procedures. The efficiency of the method is demonstrated through selected numerical examples that constitute classical problems of great interest to the structural health monitoring community.
Efficient partitioning and assignment on programs for multiprocessor execution
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1993-01-01
The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-11-01
The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Preferential Heating of Oxygen 5+ Ions by Finite-Amplitude Oblique Alfven Waves
NASA Technical Reports Server (NTRS)
Maneva, Yana G.; Vinas, Adolfo; Araneda, Jamie; Poedts, Stefaan
2016-01-01
Minor ions in the fast solar wind are known to have higher temperatures and to flow faster than protons in the interplanetary space. In this study we combine previous research on parametric instability theory and 2.5D hybrid simulations to study the onset of preferential heating of Oxygen 5+ ions by large-scale finite-amplitude Alfven waves in the collisionless fast solar wind. We consider initially non-drifting isotropic multi-species plasma, consisting of isothermal massless fluid electrons, kinetic protons and kinetic Oxygen 5+ ions. The external energy source for the plasma heating and energization are oblique monochromatic Alfven-cyclotron waves. The waves have been created by rotating the direction of initial parallel pump, which is a solution of the multi-fluid plasma dispersion relation. We consider propagation angles theta less than or equal to 30 deg. The obliquely propagating Alfven pump waves lead to strong diffusion in the ion phase space, resulting in highly anisotropic heavy ion velocity distribution functions and proton beams. We discuss the application of the model to the problems of preferential heating of minor ions in the solar corona and the fast solar wind.
A framework for grand scale parallelization of the combined finite discrete element method in 2d
NASA Astrophysics Data System (ADS)
Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.
2014-09-01
Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
Efficient Geometric Sound Propagation Using Visibility Culling
NASA Astrophysics Data System (ADS)
Chandak, Anish
2011-07-01
Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures
NASA Astrophysics Data System (ADS)
Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi
2017-04-01
Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.
Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping
2017-01-01
The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Towards large scale multi-target tracking
NASA Astrophysics Data System (ADS)
Vo, Ba-Ngu; Vo, Ba-Tuong; Reuter, Stephan; Lam, Quang; Dietmayer, Klaus
2014-06-01
Multi-target tracking is intrinsically an NP-hard problem and the complexity of multi-target tracking solutions usually do not scale gracefully with problem size. Multi-target tracking for on-line applications involving a large number of targets is extremely challenging. This article demonstrates the capability of the random finite set approach to provide large scale multi-target tracking algorithms. In particular it is shown that an approximate filter known as the labeled multi-Bernoulli filter can simultaneously track one thousand five hundred targets in clutter on a standard laptop computer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
The Parallel System for Integrating Impact Models and Sectors (pSIMS)
NASA Technical Reports Server (NTRS)
Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian
2014-01-01
We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
Toward GEOS-6, A Global Cloud System Resolving Atmospheric Model
NASA Technical Reports Server (NTRS)
Putman, William M.
2010-01-01
NASA is committed to observing and understanding the weather and climate of our home planet through the use of multi-scale modeling systems and space-based observations. Global climate models have evolved to take advantage of the influx of multi- and many-core computing technologies and the availability of large clusters of multi-core microprocessors. GEOS-6 is a next-generation cloud system resolving atmospheric model that will place NASA at the forefront of scientific exploration of our atmosphere and climate. Model simulations with GEOS-6 will produce a realistic representation of our atmosphere on the scale of typical satellite observations, bringing a visual comprehension of model results to a new level among the climate enthusiasts. In preparation for GEOS-6, the agency's flagship Earth System Modeling Framework [JDl] has been enhanced to support cutting-edge high-resolution global climate and weather simulations. Improvements include a cubed-sphere grid that exposes parallelism; a non-hydrostatic finite volume dynamical core, and algorithm designed for co-processor technologies, among others. GEOS-6 represents a fundamental advancement in the capability of global Earth system models. The ability to directly compare global simulations at the resolution of spaceborne satellite images will lead to algorithm improvements and better utilization of space-based observations within the GOES data assimilation system
Chen, Ning; Yu, Dejie; Xia, Baizhan; Liu, Jian; Ma, Zhengdong
2017-04-01
This paper presents a homogenization-based interval analysis method for the prediction of coupled structural-acoustic systems involving periodical composites and multi-scale uncertain-but-bounded parameters. In the structural-acoustic system, the macro plate structure is assumed to be composed of a periodically uniform microstructure. The equivalent macro material properties of the microstructure are computed using the homogenization method. By integrating the first-order Taylor expansion interval analysis method with the homogenization-based finite element method, a homogenization-based interval finite element method (HIFEM) is developed to solve a periodical composite structural-acoustic system with multi-scale uncertain-but-bounded parameters. The corresponding formulations of the HIFEM are deduced. A subinterval technique is also introduced into the HIFEM for higher accuracy. Numerical examples of a hexahedral box and an automobile passenger compartment are given to demonstrate the efficiency of the presented method for a periodical composite structural-acoustic system with multi-scale uncertain-but-bounded parameters.
NASA Astrophysics Data System (ADS)
Kenway, Gaetan K. W.
This thesis presents new tools and techniques developed to address the challenging problem of high-fidelity aerostructural optimization with respect to large numbers of design variables. A new mesh-movement scheme is developed that is both computationally efficient and sufficiently robust to accommodate large geometric design changes and aerostructural deformations. A fully coupled Newton-Krylov method is presented that accelerates the convergence of aerostructural systems and provides a 20% performance improvement over the traditional nonlinear block Gauss-Seidel approach and can handle more exible structures. A coupled adjoint method is used that efficiently computes derivatives for a gradient-based optimization algorithm. The implementation uses only machine accurate derivative techniques and is verified to yield fully consistent derivatives by comparing against the complex step method. The fully-coupled large-scale coupled adjoint solution method is shown to have 30% better performance than the segregated approach. The parallel scalability of the coupled adjoint technique is demonstrated on an Euler Computational Fluid Dynamics (CFD) model with more than 80 million state variables coupled to a detailed structural finite-element model of the wing with more than 1 million degrees of freedom. Multi-point high-fidelity aerostructural optimizations of a long-range wide-body, transonic transport aircraft configuration are performed using the developed techniques. The aerostructural analysis employs Euler CFD with a 2 million cell mesh and a structural finite element model with 300 000 DOF. Two design optimization problems are solved: one where takeoff gross weight is minimized, and another where fuel burn is minimized. Each optimization uses a multi-point formulation with 5 cruise conditions and 2 maneuver conditions. The optimization problems have 476 design variables are optimal results are obtained within 36 hours of wall time using 435 processors. The TOGW minimization results in a 4.2% reduction in TOGW with a 6.6% fuel burn reduction, while the fuel burn optimization resulted in a 11.2% fuel burn reduction with no change to the takeoff gross weight.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.
Multi-Scale Modeling of an Integrated 3D Braided Composite with Applications to Helicopter Arm
NASA Astrophysics Data System (ADS)
Zhang, Diantang; Chen, Li; Sun, Ying; Zhang, Yifan; Qian, Kun
2017-10-01
A study is conducted with the aim of developing multi-scale analytical method for designing the composite helicopter arm with three-dimensional (3D) five-directional braided structure. Based on the analysis of 3D braided microstructure, the multi-scale finite element modeling is developed. Finite element analysis on the load capacity of 3D five-directional braided composites helicopter arm is carried out using the software ABAQUS/Standard. The influences of the braiding angle and loading condition on the stress and strain distribution of the helicopter arm are simulated. The results show that the proposed multi-scale method is capable of accurately predicting the mechanical properties of 3D braided composites, validated by the comparison the stress-strain curves of meso-scale RVCs. Furthermore, it is found that the braiding angle is an important factor affecting the mechanical properties of 3D five-directional braided composite helicopter arm. Based on the optimized structure parameters, the nearly net-shaped composite helicopter arm is fabricated using a novel resin transfer mould (RTM) process.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi
1994-01-01
An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.
2005-01-01
A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
Hybrid Parallelization of Adaptive MHD-Kinetic Module in Multi-Scale Fluid-Kinetic Simulation Suite
Borovikov, Sergey; Heerikhuisen, Jacob; Pogorelov, Nikolai
2013-04-01
The Multi-Scale Fluid-Kinetic Simulation Suite has a computational tool set for solving partially ionized flows. In this paper we focus on recent developments of the kinetic module which solves the Boltzmann equation using the Monte-Carlo method. The module has been recently redesigned to utilize intra-node hybrid parallelization. We describe in detail the redesign process, implementation issues, and modifications made to the code. Finally, we conduct a performance analysis.
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction
NASA Technical Reports Server (NTRS)
Padovan, Joseph; Krishna, Lala; Gute, Douglas
1997-01-01
Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
Highly efficient spatial data filtering in parallel using the opensource library CPPPO
NASA Astrophysics Data System (ADS)
Municchi, Federico; Goniva, Christoph; Radl, Stefan
2016-10-01
CPPPO is a compilation of parallel data processing routines developed with the aim to create a library for "scale bridging" (i.e. connecting different scales by mean of closure models) in a multi-scale approach. CPPPO features a number of parallel filtering algorithms designed for use with structured and unstructured Eulerian meshes, as well as Lagrangian data sets. In addition, data can be processed on the fly, allowing the collection of relevant statistics without saving individual snapshots of the simulation state. Our library is provided with an interface to the widely-used CFD solver OpenFOAM®, and can be easily connected to any other software package via interface modules. Also, we introduce a novel, extremely efficient approach to parallel data filtering, and show that our algorithms scale super-linearly on multi-core clusters. Furthermore, we provide a guideline for choosing the optimal Eulerian cell selection algorithm depending on the number of CPU cores used. Finally, we demonstrate the accuracy and the parallel scalability of CPPPO in a showcase focusing on heat and mass transfer from a dense bed of particles.
Viscous-enstrophy scaling law for Navier-Stokes reconnection
NASA Astrophysics Data System (ADS)
Kerr, Robert M.
2017-11-01
Simulations of perturbed, helical trefoil vortex knots and anti-parallel vortices find ν-independent collapse of temporally scaled (√{ ν} Z) - 1 / 2, Z enstrophy, between when the loops first touch at tΓ, and when reconnection ends at tx for the viscosity ν varying by 256. Due to mathematical bounds upon higher-order norms, this collapse requires that the domain increase as ν decreases, possibly to allow large-scale negative helicity to grow as compensation for small-scale positive helicity and enstrophy growth. This mechanism could be a step towards explaining how smooth solutions of the Navier-Stokes can generate finite-energy dissipation in a finite time as ν -> 0 .
Multi-Scale Modeling of Liquid Phase Sintering Affected by Gravity: Preliminary Analysis
NASA Technical Reports Server (NTRS)
Olevsky, Eugene; German, Randall M.
2012-01-01
A multi-scale simulation concept taking into account impact of gravity on liquid phase sintering is described. The gravity influence can be included at both the micro- and macro-scales. At the micro-scale, the diffusion mass-transport is directionally modified in the framework of kinetic Monte-Carlo simulations to include the impact of gravity. The micro-scale simulations can provide the values of the constitutive parameters for macroscopic sintering simulations. At the macro-scale, we are attempting to embed a continuum model of sintering into a finite-element framework that includes the gravity forces and substrate friction. If successful, the finite elements analysis will enable predictions relevant to space-based processing, including size and shape and property predictions. Model experiments are underway to support the models via extraction of viscosity moduli versus composition, particle size, heating rate, temperature and time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, Arno; Li, Z.; Ng, C.
The Compact Linear Collider (CLIC) provides a path to a multi-TeV accelerator to explore the energy frontier of High Energy Physics. Its novel two-beam accelerator concept envisions rf power transfer to the accelerating structures from a separate high-current decelerator beam line consisting of power extraction and transfer structures (PETS). It is critical to numerically verify the fundamental and higher-order mode properties in and between the two beam lines with high accuracy and confidence. To solve these large-scale problems, SLAC's parallel finite element electromagnetic code suite ACE3P is employed. Using curvilinear conformal meshes and higher-order finite element vector basis functions, unprecedentedmore » accuracy and computational efficiency are achieved, enabling high-fidelity modeling of complex detuned structures such as the CLIC TD24 accelerating structure. In this paper, time-domain simulations of wakefield coupling effects in the combined system of PETS and the TD24 structures are presented. The results will help to identify potential issues and provide new insights on the design, leading to further improvements on the novel CLIC two-beam accelerator scheme.« less
NASA Astrophysics Data System (ADS)
Yang, Liping; Zhang, Lei; He, Jiansen; Tu, Chuanyi; Li, Shengtai; Wang, Xin; Wang, Linghua
2018-03-01
Multi-order structure functions in the solar wind are reported to display a monofractal scaling when sampled parallel to the local magnetic field and a multifractal scaling when measured perpendicularly. Whether and to what extent will the scaling anisotropy be weakened by the enhancement of turbulence amplitude relative to the background magnetic strength? In this study, based on two runs of the magnetohydrodynamic (MHD) turbulence simulation with different relative levels of turbulence amplitude, we investigate and compare the scaling of multi-order magnetic structure functions and magnetic probability distribution functions (PDFs) as well as their dependence on the direction of the local field. The numerical results show that for the case of large-amplitude MHD turbulence, the multi-order structure functions display a multifractal scaling at all angles to the local magnetic field, with PDFs deviating significantly from the Gaussian distribution and a flatness larger than 3 at all angles. In contrast, for the case of small-amplitude MHD turbulence, the multi-order structure functions and PDFs have different features in the quasi-parallel and quasi-perpendicular directions: a monofractal scaling and Gaussian-like distribution in the former, and a conversion of a monofractal scaling and Gaussian-like distribution into a multifractal scaling and non-Gaussian tail distribution in the latter. These results hint that when intermittencies are abundant and intense, the multifractal scaling in the structure functions can appear even if it is in the quasi-parallel direction; otherwise, the monofractal scaling in the structure functions remains even if it is in the quasi-perpendicular direction.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Parallel Multi-Step/Multi-Rate Integration of Two-Time Scale Dynamic Systems
NASA Technical Reports Server (NTRS)
Chang, Johnny T.; Ploen, Scott R.; Sohl, Garett. A,; Martin, Bryan J.
2004-01-01
Increasing demands on the fidelity of simulations for real-time and high-fidelity simulations are stressing the capacity of modern processors. New integration techniques are required that provide maximum efficiency for systems that are parallelizable. However many current techniques make assumptions that are at odds with non-cascadable systems. A new serial multi-step/multi-rate integration algorithm for dual-timescale continuous state systems is presented which applies to these systems, and is extended to a parallel multi-step/multi-rate algorithm. The superior performance of both algorithms is demonstrated through a representative example.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.; Ovall, J.; Holst, M.
2014-12-01
We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
Multi-Scale Computational Modeling of Two-Phased Metal Using GMC Method
NASA Technical Reports Server (NTRS)
Moghaddam, Masoud Ghorbani; Achuthan, A.; Bednacyk, B. A.; Arnold, S. M.; Pineda, E. J.
2014-01-01
A multi-scale computational model for determining plastic behavior in two-phased CMSX-4 Ni-based superalloys is developed on a finite element analysis (FEA) framework employing crystal plasticity constitutive model that can capture the microstructural scale stress field. The generalized method of cells (GMC) micromechanics model is used for homogenizing the local field quantities. At first, GMC as stand-alone is validated by analyzing a repeating unit cell (RUC) as a two-phased sample with 72.9% volume fraction of gamma'-precipitate in the gamma-matrix phase and comparing the results with those predicted by finite element analysis (FEA) models incorporating the same crystal plasticity constitutive model. The global stress-strain behavior and the local field quantity distributions predicted by GMC demonstrated good agreement with FEA. High computational saving, at the expense of some accuracy in the components of local tensor field quantities, was obtained with GMC. Finally, the capability of the developed multi-scale model linking FEA and GMC to solve real life sized structures is demonstrated by analyzing an engine disc component and determining the microstructural scale details of the field quantities.
Programming Probabilistic Structural Analysis for Parallel Processing Computer
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.
1991-01-01
The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
NASA Astrophysics Data System (ADS)
Kumari, Komal; Donzis, Diego
2017-11-01
Highly resolved computational simulations on massively parallel machines are critical in understanding the physics of a vast number of complex phenomena in nature governed by partial differential equations. Simulations at extreme levels of parallelism present many challenges with communication between processing elements (PEs) being a major bottleneck. In order to fully exploit the computational power of exascale machines one needs to devise numerical schemes that relax global synchronizations across PEs. This asynchronous computations, however, have a degrading effect on the accuracy of standard numerical schemes.We have developed asynchrony-tolerant (AT) schemes that maintain order of accuracy despite relaxed communications. We show, analytically and numerically, that these schemes retain their numerical properties with multi-step higher order temporal Runge-Kutta schemes. We also show that for a range of optimized parameters,the computation time and error for AT schemes is less than their synchronous counterpart. Stability of the AT schemes which depends upon history and random nature of delays, are also discussed. Support from NSF is gratefully acknowledged.
A class of hybrid finite element methods for electromagnetics: A review
NASA Technical Reports Server (NTRS)
Volakis, J. L.; Chatterjee, A.; Gong, J.
1993-01-01
Integral equation methods have generally been the workhorse for antenna and scattering computations. In the case of antennas, they continue to be the prominent computational approach, but for scattering applications the requirement for large-scale computations has turned researchers' attention to near neighbor methods such as the finite element method, which has low O(N) storage requirements and is readily adaptable in modeling complex geometrical features and material inhomogeneities. In this paper, we review three hybrid finite element methods for simulating composite scatterers, conformal microstrip antennas, and finite periodic arrays. Specifically, we discuss the finite element method and its application to electromagnetic problems when combined with the boundary integral, absorbing boundary conditions, and artificial absorbers for terminating the mesh. Particular attention is given to large-scale simulations, methods, and solvers for achieving low memory requirements and code performance on parallel computing architectures.
Multi-scale Slip Inversion Based on Simultaneous Spatial and Temporal Domain Wavelet Transform
NASA Astrophysics Data System (ADS)
Liu, W.; Yao, H.; Yang, H. Y.
2017-12-01
Finite fault inversion is a widely used method to study earthquake rupture processes. Some previous studies have proposed different methods to implement finite fault inversion, including time-domain, frequency-domain, and wavelet-domain methods. Many previous studies have found that different frequency bands show different characteristics of the seismic rupture (e.g., Wang and Mori, 2011; Yao et al., 2011, 2013; Uchide et al., 2013; Yin et al., 2017). Generally, lower frequency waveforms correspond to larger-scale rupture characteristics while higher frequency data are representative of smaller-scale ones. Therefore, multi-scale analysis can help us understand the earthquake rupture process thoroughly from larger scale to smaller scale. By the use of wavelet transform, the wavelet-domain methods can analyze both the time and frequency information of signals in different scales. Traditional wavelet-domain methods (e.g., Ji et al., 2002) implement finite fault inversion with both lower and higher frequency signals together to recover larger-scale and smaller-scale characteristics of the rupture process simultaneously. Here we propose an alternative strategy with a two-step procedure, i.e., firstly constraining the larger-scale characteristics with lower frequency signals, and then resolving the smaller-scale ones with higher frequency signals. We have designed some synthetic tests to testify our strategy and compare it with the traditional one. We also have applied our strategy to study the 2015 Gorkha Nepal earthquake using tele-seismic waveforms. Both the traditional method and our two-step strategy only analyze the data in different temporal scales (i.e., different frequency bands), while the spatial distribution of model parameters also shows multi-scale characteristics. A more sophisticated strategy is to transfer the slip model into different spatial scales, and then analyze the smooth slip distribution (larger scales) with lower frequency data firstly and more detailed slip distribution (smaller scales) with higher frequency data subsequently. We are now implementing the slip inversion using both spatial and temporal domain wavelets. This multi-scale analysis can help us better understand frequency-dependent rupture characteristics of large earthquakes.
The influence of the self-consistent mode structure on the Coriolis pinch effect
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peeters, A. G.; Camenen, Y.; Casson, F. J.
This paper discusses the effect of the mode structure on the Coriolis pinch effect [A. G. Peeters, C. Angioni, and D. Strintzi, Phys. Rev. Lett. 98, 265003 (2007)]. It is shown that the Coriolis drift effect can be compensated for by a finite parallel wave vector, resulting in a reduced momentum pinch velocity. Gyrokinetic simulations in full toroidal geometry reveal that parallel dynamics effectively removes the Coriolis pinch for the case of adiabatic electrons, while the compensation due to the parallel dynamics is incomplete for the case of kinetic electrons, resulting in a finite pinch velocity. The finite flux inmore » the case of kinetic electrons is interpreted to be related to the electron trapping, which prevents a strong asymmetry in the electrostatic potential with respect to the low field side position. The physics picture developed here leads to the discovery and explanation of two unexpected effects: First the pinch velocity scales with the trapped particle fraction (root of the inverse aspect ratio), and second there is no strong collisionality dependence. The latter is related to the role of the trapped electrons, which retain some symmetry in the eigenmode, but play no role in the perturbed parallel velocity.« less
A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis
NASA Astrophysics Data System (ADS)
Jokhio, G. A.; Izzuddin, B. A.
2015-05-01
This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
NASA Astrophysics Data System (ADS)
Ji, X.; Shen, C.
2017-12-01
Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Multi-scale modelling of elastic moduli of trabecular bone
Hamed, Elham; Jasiuk, Iwona; Yoo, Andrew; Lee, YikHan; Liszka, Tadeusz
2012-01-01
We model trabecular bone as a nanocomposite material with hierarchical structure and predict its elastic properties at different structural scales. The analysis involves a bottom-up multi-scale approach, starting with nanoscale (mineralized collagen fibril) and moving up the scales to sub-microscale (single lamella), microscale (single trabecula) and mesoscale (trabecular bone) levels. Continuum micromechanics methods, composite materials laminate theory and finite-element methods are used in the analysis. Good agreement is found between theoretical and experimental results. PMID:22279160
Elsaadany, Mostafa; Yan, Karen Chang; Yildirim-Ayan, Eda
2017-06-01
Successful tissue engineering and regenerative therapy necessitate having extensive knowledge about mechanical milieu in engineered tissues and the resident cells. In this study, we have merged two powerful analysis tools, namely finite element analysis and stochastic analysis, to understand the mechanical strain within the tissue scaffold and residing cells and to predict the cell viability upon applying mechanical strains. A continuum-based multi-length scale finite element model (FEM) was created to simulate the physiologically relevant equiaxial strain exposure on cell-embedded tissue scaffold and to calculate strain transferred to the tissue scaffold (macro-scale) and residing cells (micro-scale) upon various equiaxial strains. The data from FEM were used to predict cell viability under various equiaxial strain magnitudes using stochastic damage criterion analysis. The model validation was conducted through mechanically straining the cardiomyocyte-encapsulated collagen constructs using a custom-built mechanical loading platform (EQUicycler). FEM quantified the strain gradients over the radial and longitudinal direction of the scaffolds and the cells residing in different areas of interest. With the use of the experimental viability data, stochastic damage criterion, and the average cellular strains obtained from multi-length scale models, cellular viability was predicted and successfully validated. This methodology can provide a great tool to characterize the mechanical stimulation of bioreactors used in tissue engineering applications in providing quantification of mechanical strain and predicting cellular viability variations due to applied mechanical strain.
Algorithm implementation on the Navier-Stokes computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krist, S.E.; Zang, T.A.
1987-03-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Algorithm implementation on the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Krist, Steven E.; Zang, Thomas A.
1987-01-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.
NASA Astrophysics Data System (ADS)
Liu, Y.; Li, Y.
2016-12-01
We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
DOUAR: A new three-dimensional creeping flow numerical model for the solution of geological problems
NASA Astrophysics Data System (ADS)
Braun, Jean; Thieulot, Cédric; Fullsack, Philippe; DeKool, Marthijn; Beaumont, Christopher; Huismans, Ritske
2008-12-01
We present a new finite element code for the solution of the Stokes and energy (or heat transport) equations that has been purposely designed to address crustal-scale to mantle-scale flow problems in three dimensions. Although it is based on an Eulerian description of deformation and flow, the code, which we named DOUAR ('Earth' in Breton language), has the ability to track interfaces and, in particular, the free surface, by using a dual representation based on a set of particles placed on the interface and the computation of a level set function on the nodes of the finite element grid, thus ensuring accuracy and efficiency. The code also makes use of a new method to compute the dynamic Delaunay triangulation connecting the particles based on non-Euclidian, curvilinear measure of distance, ensuring that the density of particles remains uniform and/or dynamically adapted to the curvature of the interface. The finite element discretization is based on a non-uniform, yet regular octree division of space within a unit cube that allows efficient adaptation of the finite element discretization, i.e. in regions of strong velocity gradient or high interface curvature. The finite elements are cubes (the leaves of the octree) in which a q1- p0 interpolation scheme is used. Nodal incompatibilities across faces separating elements of differing size are dealt with by introducing linear constraints among nodal degrees of freedom. Discontinuities in material properties across the interfaces are accommodated by the use of a novel method (which we called divFEM) to integrate the finite element equations in which the elemental volume is divided by a local octree to an appropriate depth (resolution). A variety of rheologies have been implemented including linear, non-linear and thermally activated creep and brittle (or plastic) frictional deformation. A simple smoothing operator has been defined to avoid checkerboard oscillations in pressure that tend to develop when using a highly irregular octree discretization and the tri-linear (or q1- p0) finite element. A three-dimensional cloud of particles is used to track material properties that depend on the integrated history of deformation (the integrated strain, for example); its density is variable and dynamically adapted to the computed flow. The large system of algebraic equations that results from the finite element discretization and linearization of the basic partial differential equations is solved using a multi-frontal massively parallel direct solver that can efficiently factorize poorly conditioned systems resulting from the highly non-linear rheology and the presence of the free surface. The code is almost entirely parallelized. We present example results including the onset of a Rayleigh-Taylor instability, the indentation of a rigid-plastic material and the formation of a fold beneath a free eroding surface, that demonstrate the accuracy, efficiency and appropriateness of the new code to solve complex geodynamical problems in three dimensions.
Developing parallel GeoFEST(P) using the PYRAMID AMR library
NASA Technical Reports Server (NTRS)
Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.
2004-01-01
The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
NASA Astrophysics Data System (ADS)
Hou, Zhenlong; Huang, Danian
2017-09-01
In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.
2016-01-01
The problem of multi-scale modelling of damage development in a SiC ceramic fibre-reinforced SiC matrix ceramic composite tube is addressed, with the objective of demonstrating the ability of the finite-element microstructure meshfree (FEMME) model to introduce important aspects of the microstructure into a larger scale model of the component. These are particularly the location, orientation and geometry of significant porosity and the load-carrying capability and quasi-brittle failure behaviour of the fibre tows. The FEMME model uses finite-element and cellular automata layers, connected by a meshfree layer, to efficiently couple the damage in the microstructure with the strain field at the component level. Comparison is made with experimental observations of damage development in an axially loaded composite tube, studied by X-ray computed tomography and digital volume correlation. Recommendations are made for further development of the model to achieve greater fidelity to the microstructure. This article is part of the themed issue ‘Multiscale modelling of the structural integrity of composite materials’. PMID:27242308
NASA Astrophysics Data System (ADS)
Nguyen, Thi-Thuy-My; Gandin, Charles-André; Combeau, Hervé; Založnik, Miha; Bellet, Michel
2018-02-01
The transport of solid crystals in the liquid pool during solidification of large ingots is known to have a significant effect on their final grain structure and macrosegregation. Numerical modeling of the associated physics is challenging since complex and strong interactions between heat and mass transfer at the microscopic and macroscopic scales must be taken into account. The paper presents a finite element multi-scale solidification model coupling nucleation, growth, and solute diffusion at the microscopic scale, represented by a single unique grain, while also including transport of the liquid and solid phases at the macroscopic scale of the ingots. The numerical resolution is based on a splitting method which sequentially describes the evolution and interaction of quantities into a transport and a growth stage. This splitting method reduces the non-linear complexity of the set of equations and is, for the first time, implemented using the finite element method. This is possible due to the introduction of an artificial diffusion in all conservation equations solved by the finite element method. Simulations with and without grain transport are compared to demonstrate the impact of solid phase transport on the solidification process as well as the formation of macrosegregation in a binary alloy (Sn-5 wt pct Pb). The model is also applied to the solidification of the binary alloy Fe-0.36 wt pct C in a domain representative of a 3.3-ton steel ingot.
Adaptive multi-resolution 3D Hartree-Fock-Bogoliubov solver for nuclear structure
NASA Astrophysics Data System (ADS)
Pei, J. C.; Fann, G. I.; Harrison, R. J.; Nazarewicz, W.; Shi, Yue; Thornton, S.
2014-08-01
Background: Complex many-body systems, such as triaxial and reflection-asymmetric nuclei, weakly bound halo states, cluster configurations, nuclear fragments produced in heavy-ion fusion reactions, cold Fermi gases, and pasta phases in neutron star crust, are all characterized by large sizes and complex topologies in which many geometrical symmetries characteristic of ground-state configurations are broken. A tool of choice to study such complex forms of matter is an adaptive multi-resolution wavelet analysis. This method has generated much excitement since it provides a common framework linking many diversified methodologies across different fields, including signal processing, data compression, harmonic analysis and operator theory, fractals, and quantum field theory. Purpose: To describe complex superfluid many-fermion systems, we introduce an adaptive pseudospectral method for solving self-consistent equations of nuclear density functional theory in three dimensions, without symmetry restrictions. Methods: The numerical method is based on the multi-resolution and computational harmonic analysis techniques with a multi-wavelet basis. The application of state-of-the-art parallel programming techniques include sophisticated object-oriented templates which parse the high-level code into distributed parallel tasks with a multi-thread task queue scheduler for each multi-core node. The internode communications are asynchronous. The algorithm is variational and is capable of solving coupled complex-geometric systems of equations adaptively, with functional and boundary constraints, in a finite spatial domain of very large size, limited by existing parallel computer memory. For smooth functions, user-defined finite precision is guaranteed. Results: The new adaptive multi-resolution Hartree-Fock-Bogoliubov (HFB) solver madness-hfb is benchmarked against a two-dimensional coordinate-space solver hfb-ax that is based on the B-spline technique and a three-dimensional solver hfodd that is based on the harmonic-oscillator basis expansion. Several examples are considered, including the self-consistent HFB problem for spin-polarized trapped cold fermions and the Skyrme-Hartree-Fock (+BCS) problem for triaxial deformed nuclei. Conclusions: The new madness-hfb framework has many attractive features when applied to nuclear and atomic problems involving many-particle superfluid systems. Of particular interest are weakly bound nuclear configurations close to particle drip lines, strongly elongated and dinuclear configurations such as those present in fission and heavy-ion fusion, and exotic pasta phases that appear in neutron star crust.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Procassini, R.J.
1997-12-31
The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less
Modeling Impact-induced Failure of Polysilicon MEMS: A Multi-scale Approach.
Mariani, Stefano; Ghisi, Aldo; Corigliano, Alberto; Zerbini, Sarah
2009-01-01
Failure of packaged polysilicon micro-electro-mechanical systems (MEMS) subjected to impacts involves phenomena occurring at several length-scales. In this paper we present a multi-scale finite element approach to properly allow for: (i) the propagation of stress waves inside the package; (ii) the dynamics of the whole MEMS; (iii) the spreading of micro-cracking in the failing part(s) of the sensor. Through Monte Carlo simulations, some effects of polysilicon micro-structure on the failure mode are elucidated.
Lattice Boltzmann multi-phase simulations in porous media using Multiple GPUs
NASA Astrophysics Data System (ADS)
Toelke, J.; De Prisco, G.; Mu, Y.
2011-12-01
Ingrain's digital rock physics lab computes the physical properties and fluid flow characteristics of oil and gas reservoir rocks including shales, carbonates and sandstones. Ingrain uses advanced lattice Boltzmann methods (LBM) to simulate multiphase flow in the rocks (porous media). We present a very efficient implementation of these methods based on CUDA. Because LBM operates on a finite difference grid, is explicit in nature, and requires only next-neighbor interactions, it is suitable for implementation on GPUs. Since GPU hardware allows for very fine grain parallelism, every lattice site can be handled by a different core. Data has to be loaded from and stored to the device memory in such a way that dense access to the memory is ensured. This can be achieved by accessing the lattice nodes with respect to their contiguous memory locations [1,2]. The simulation engine uses a sparse data structure to represent the grid and advanced algorithms to handle the moving fluid-fluid interface. The simulations are accelerated on one GPU by one order of magnitude compared to a state of the art multicore desktop computer. The engine is parallelized using MPI and runs on multiple GPUs in the same node or across the Infiniband network. Simulations with up to 50 GPUs in parallel are presented. With this simulator using it is possible to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media in a commercial manner and to predict important rock properties like absolute permeability, relative permeabilites and capillary pressure [3,4]. Results and videos of these simulations in complex real world porous media and rocks are presented and discussed.
Parallelized Three-Dimensional Resistivity Inversion Using Finite Elements And Adjoint State Methods
NASA Astrophysics Data System (ADS)
Schaa, Ralf; Gross, Lutz; Du Plessis, Jaco
2015-04-01
The resistivity method is one of the oldest geophysical exploration methods, which employs one pair of electrodes to inject current into the ground and one or more pairs of electrodes to measure the electrical potential difference. The potential difference is a non-linear function of the subsurface resistivity distribution described by an elliptic partial differential equation (PDE) of the Poisson type. Inversion of measured potentials solves for the subsurface resistivity represented by PDE coefficients. With increasing advances in multichannel resistivity acquisition systems (systems with more than 60 channels and full waveform recording are now emerging), inversion software require efficient storage and solver algorithms. We developed the finite element solver Escript, which provides a user-friendly programming environment in Python to solve large-scale PDE-based problems (see https://launchpad.net/escript-finley). Using finite elements, highly irregular shaped geology and topography can readily be taken into account. For the 3D resistivity problem, we have implemented the secondary potential approach, where the PDE is decomposed into a primary potential caused by the source current and the secondary potential caused by changes in subsurface resistivity. The primary potential is calculated analytically, and the boundary value problem for the secondary potential is solved using nodal finite elements. This approach removes the singularity caused by the source currents and provides more accurate 3D resistivity models. To solve the inversion problem we apply a 'first optimize then discretize' approach using the quasi-Newton scheme in form of the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method (see Gross & Kemp 2013). The evaluation of the cost function requires the solution of the secondary potential PDE for each source current and the solution of the corresponding adjoint-state PDE for the cost function gradients with respect to the subsurface resistivity. The Hessian of the regularization term is used as preconditioner which requires an additional PDE solution in each iteration step. As it turns out, the relevant PDEs are naturally formulated in the finite element framework. Using the domain decomposition method provided in Escript, the inversion scheme has been parallelized for distributed memory computers with multi-core shared memory nodes. We show numerical examples from simple layered models to complex 3D models and compare with the results from other methods. The inversion scheme is furthermore tested on a field data example to characterise localised freshwater discharge in a coastal environment.. References: L. Gross and C. Kemp (2013) Large Scale Joint Inversion of Geophysical Data using the Finite Element Method in escript. ASEG Extended Abstracts 2013, http://dx.doi.org/10.1071/ASEG2013ab306
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hager, Robert, E-mail: rhager@pppl.gov; Yoon, E.S., E-mail: yoone@rpi.edu; Ku, S., E-mail: sku@pppl.gov
2016-06-15
Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. In this article, the non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. The finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable onmore » high-performance computing systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. The collision operator's good weak and strong scaling behavior are shown.« less
Hager, Robert; Yoon, E. S.; Ku, S.; ...
2016-04-04
Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. The non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. Moreover, the finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable on high-performance computingmore » systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. As a result, the collision operator's good weak and strong scaling behavior are shown.« less
OpenSeesPy: Python library for the OpenSees finite element framework
NASA Astrophysics Data System (ADS)
Zhu, Minjie; McKenna, Frank; Scott, Michael H.
2018-01-01
OpenSees, an open source finite element software framework, has been used broadly in the earthquake engineering community for simulating the seismic response of structural and geotechnical systems. The framework allows users to perform finite element analysis with a scripting language and for developers to create both serial and parallel finite element computer applications as interpreters. For the last 15 years, Tcl has been the primary scripting language to which the model building and analysis modules of OpenSees are linked. To provide users with different scripting language options, particularly Python, the OpenSees interpreter interface was refactored to provide multi-interpreter capabilities. This refactoring, resulting in the creation of OpenSeesPy as a Python module, is accomplished through an abstract interface for interpreter calls with concrete implementations for different scripting languages. Through this approach, users are able to develop applications that utilize the unique features of several scripting languages while taking advantage of advanced finite element analysis models and algorithms.
NASA Astrophysics Data System (ADS)
Gerke, Kirill M.; Vasilyev, Roman V.; Khirevich, Siarhei; Collins, Daniel; Karsanina, Marina V.; Sizonenko, Timofey O.; Korost, Dmitry V.; Lamontagne, Sébastien; Mallants, Dirk
2018-05-01
Permeability is one of the fundamental properties of porous media and is required for large-scale Darcian fluid flow and mass transport models. Whilst permeability can be measured directly at a range of scales, there are increasing opportunities to evaluate permeability from pore-scale fluid flow simulations. We introduce the free software Finite-Difference Method Stokes Solver (FDMSS) that solves Stokes equation using a finite-difference method (FDM) directly on voxelized 3D pore geometries (i.e. without meshing). Based on explicit convergence studies, validation on sphere packings with analytically known permeabilities, and comparison against lattice-Boltzmann and other published FDM studies, we conclude that FDMSS provides a computationally efficient and accurate basis for single-phase pore-scale flow simulations. By implementing an efficient parallelization and code optimization scheme, permeability inferences can now be made from 3D images of up to 109 voxels using modern desktop computers. Case studies demonstrate the broad applicability of the FDMSS software for both natural and artificial porous media.
Wang, Yawei; Wang, Lizhen; Du, Chengfei; Mo, Zhongjun; Fan, Yubo
2016-06-01
In contrast to numerous researches on static or quasi-static stiffness of cervical spine segments, very few investigations on their dynamic stiffness were published. Currently, scale factors and estimated coefficients were usually used in multi-body models for including viscoelastic properties and damping effects, meanwhile viscoelastic properties of some tissues were unavailable for establishing finite element models. Because dynamic stiffness of cervical spine segments in these models were difficult to validate because of lacking in experimental data, we tried to gain some insights on current modeling methods through studying dynamic stiffness differences between these models. A finite element model and a multi-body model of C6-C7 segment were developed through using available material data and typical modeling technologies. These two models were validated with quasi-static response data of the C6-C7 cervical spine segment. Dynamic stiffness differences were investigated through controlling motions of C6 vertebrae at different rates and then comparing their reaction forces or moments. Validation results showed that both the finite element model and the multi-body model could generate reasonable responses under quasi-static loads, but the finite element segment model exhibited more nonlinear characters. Dynamic response investigations indicated that dynamic stiffness of this finite element model might be underestimated because of the absence of dynamic stiffen effect and damping effects of annulus fibrous, while representation of these effects also need to be improved in current multi-body model. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Relativistic thermal electron scale instabilities in sheared flow plasma
NASA Astrophysics Data System (ADS)
Miller, Evan D.; Rogers, Barrett N.
2016-04-01
> The linear dispersion relation obeyed by finite-temperature, non-magnetized, relativistic two-fluid plasmas is presented, in the special case of a discontinuous bulk velocity profile and parallel wave vectors. It is found that such flows become universally unstable at the collisionless electron skin-depth scale. Further analyses are performed in the limits of either free-streaming ions or ultra-hot plasmas. In these limits, the system is highly unstable in the parameter regimes associated with either the electron scale Kelvin-Helmholtz instability (ESKHI) or the relativistic electron scale sheared flow instability (RESI) recently highlighted by Gruzinov. Coupling between these modes provides further instability throughout the remaining parameter space, provided both shear flow and temperature are finite. An explicit parameter space bound on the highly unstable region is found.
Nonlinear viscoplasticity in ASPECT: benchmarking and applications to subduction
NASA Astrophysics Data System (ADS)
Glerum, Anne; Thieulot, Cedric; Fraters, Menno; Blom, Constantijn; Spakman, Wim
2018-03-01
ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is a massively parallel finite element code originally designed for modeling thermal convection in the mantle with a Newtonian rheology. The code is characterized by modern numerical methods, high-performance parallelism and extensibility. This last characteristic is illustrated in this work: we have extended the use of ASPECT from global thermal convection modeling to upper-mantle-scale applications of subduction. Subduction modeling generally requires the tracking of multiple materials with different properties and with nonlinear viscous and viscoplastic rheologies. To this end, we implemented a frictional plasticity criterion that is combined with a viscous diffusion and dislocation creep rheology. Because ASPECT uses compositional fields to represent different materials, all material parameters are made dependent on a user-specified number of fields. The goal of this paper is primarily to describe and verify our implementations of complex, multi-material rheology by reproducing the results of four well-known two-dimensional benchmarks: the indentor benchmark, the brick experiment, the sandbox experiment and the slab detachment benchmark. Furthermore, we aim to provide hands-on examples for prospective users by demonstrating the use of multi-material viscoplasticity with three-dimensional, thermomechanical models of oceanic subduction, putting ASPECT on the map as a community code for high-resolution, nonlinear rheology subduction modeling.
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model
NASA Astrophysics Data System (ADS)
Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin
2016-08-01
This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
Simulation of 2D Kinetic Effects in Plasmas using the Grid Based Continuum Code LOKI
NASA Astrophysics Data System (ADS)
Banks, Jeffrey; Berger, Richard; Chapman, Tom; Brunner, Stephan
2016-10-01
Kinetic simulation of multi-dimensional plasma waves through direct discretization of the Vlasov equation is a useful tool to study many physical interactions and is particularly attractive for situations where minimal fluctuation levels are desired, for instance, when measuring growth rates of plasma wave instabilities. However, direct discretization of phase space can be computationally expensive, and as a result there are few examples of published results using Vlasov codes in more than a single configuration space dimension. In an effort to fill this gap we have developed the Eulerian-based kinetic code LOKI that evolves the Vlasov-Poisson system in 2+2-dimensional phase space. The code is designed to reduce the cost of phase-space computation by using fully 4th order accurate conservative finite differencing, while retaining excellent parallel scalability that efficiently uses large scale computing resources. In this poster I will discuss the algorithms used in the code as well as some aspects of their parallel implementation using MPI. I will also overview simulation results of basic plasma wave instabilities relevant to laser plasma interaction, which have been obtained using the code.
Multi-level Hierarchical Poly Tree computer architectures
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug
1990-01-01
Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Grid-Enabled Quantitative Analysis of Breast Cancer
2010-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...research, we designed a pilot study utilizing large scale parallel Grid computing harnessing nationwide infrastructure for medical image analysis . Also
ACCURATE CHEMICAL MASTER EQUATION SOLUTION USING MULTI-FINITE BUFFERS
Cao, Youfang; Terebus, Anna; Liang, Jie
2016-01-01
The discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multi-scale nature of many networks where reaction rates have large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate solutions can be computed. It is also important to know if all major probabilistic peaks have been computed. Here we introduce the Accurate CME (ACME) algorithm for obtaining direct solutions to dCMEs. With multi-finite buffers for reducing the state space by O(n!), exact steady-state and time-evolving network probability landscapes can be computed. We further describe a theoretical framework of aggregating microstates into a smaller number of macrostates by decomposing a network into independent aggregated birth and death processes, and give an a priori method for rapidly determining steady-state truncation errors. The maximal sizes of the finite buffers for a given error tolerance can also be pre-computed without costly trial solutions of dCMEs. We show exactly computed probability landscapes of three multi-scale networks, namely, a 6-node toggle switch, 11-node phage-lambda epigenetic circuit, and 16-node MAPK cascade network, the latter two with no known solutions. We also show how probabilities of rare events can be computed from first-passage times, another class of unsolved problems challenging for simulation-based techniques due to large separations in time scales. Overall, the ACME method enables accurate and efficient solutions of the dCME for a large class of networks. PMID:27761104
A numerical differentiation library exploiting parallel architectures
NASA Astrophysics Data System (ADS)
Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.
2009-08-01
We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS
NASA Astrophysics Data System (ADS)
Pavia, F.; Curtin, W. A.
2015-07-01
Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 Å from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of ∼50 nm using fewer than 1 000 000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vay, Jean-Luc, E-mail: jlvay@lbl.gov; Haber, Irving; Godfrey, Brendan B.
Pseudo-spectral electromagnetic solvers (i.e. representing the fields in Fourier space) have extraordinary precision. In particular, Haber et al. presented in 1973 a pseudo-spectral solver that integrates analytically the solution over a finite time step, under the usual assumption that the source is constant over that time step. Yet, pseudo-spectral solvers have not been widely used, due in part to the difficulty for efficient parallelization owing to global communications associated with global FFTs on the entire computational domains. A method for the parallelization of electromagnetic pseudo-spectral solvers is proposed and tested on single electromagnetic pulses, and on Particle-In-Cell simulations of themore » wakefield formation in a laser plasma accelerator. The method takes advantage of the properties of the Discrete Fourier Transform, the linearity of Maxwell’s equations and the finite speed of light for limiting the communications of data within guard regions between neighboring computational domains. Although this requires a small approximation, test results show that no significant error is made on the test cases that have been presented. The proposed method opens the way to solvers combining the favorable parallel scaling of standard finite-difference methods with the accuracy advantages of pseudo-spectral methods.« less
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Interface-Resolving Simulation of Collision Efficiency of Cloud Droplets
NASA Astrophysics Data System (ADS)
Wang, Lian-Ping; Peng, Cheng; Rosa, Bodgan; Onishi, Ryo
2017-11-01
Small-scale air turbulence could enhance the geometric collision rate of cloud droplets while large-scale air turbulence could augment the diffusional growth of cloud droplets. Air turbulence could also enhance the collision efficiency of cloud droplets. Accurate simulation of collision efficiency, however, requires capture of the multi-scale droplet-turbulence and droplet-droplet interactions, which has only been partially achieved in the recent past using the hybrid direct numerical simulation (HDNS) approach. % where Stokes disturbance flow is assumed. The HDNS approach has two major drawbacks: (1) the short-range droplet-droplet interaction is not treated rigorously; (2) the finite-Reynolds number correction to the collision efficiency is not included. In this talk, using two independent numerical methods, we will develop an interface-resolved simulation approach in which the disturbance flows are directly resolved numerically, combined with a rigorous lubrication correction model for near-field droplet-droplet interaction. This multi-scale approach is first used to study the effect of finite flow Reynolds numbers on the droplet collision efficiency in still air. Our simulation results show a significant finite-Re effect on collision efficiency when the droplets are of similar sizes. Preliminary results on integrating this approach in a turbulent flow laden with droplets will also be presented. This work is partially supported by the National Science Foundation.
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
High performance computation of radiative transfer equation using the finite element method
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.
2018-05-01
This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
NASA Astrophysics Data System (ADS)
Pan, Yanqiao; Huang, YongAn; Guo, Lei; Ding, Yajiang; Yin, Zhouping
2015-04-01
It is critical and challenging to achieve the individual jetting ability and high consistency in multi-nozzle electrohydrodynamic jet printing (E-jet printing). We proposed multi-level voltage method (MVM) to implement the addressable E-jet printing using multiple parallel nozzles with high consistency. The fabricated multi-nozzle printhead for MVM consists of three parts: PMMA holder, stainless steel capillaries (27G, outer diameter 400 μm) and FR-4 extractor layer. The key of MVM is to control the maximum meniscus electric field on each nozzle. The individual jetting control can be implemented when the rings under the jetting nozzles are 0 kV and the other rings are 0.5 kV. The onset electric field for each nozzle is ˜3.4 kV/mm by numerical simulation. Furthermore, a series of printing experiments are performed to show the advantage of MVM in printing consistency than the "one-voltage method" and "improved E-jet method", by combination with finite element analyses. The good dimension consistency (274μm, 276μm, 280μm) and position consistency of the droplet array on the hydrophobic Si substrate verified the enhancements. It shows that MVM is an effective technique to implement the addressable E-jet printing with multiple parallel nozzles in high consistency.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howison, Mark; Bethel, E. Wes; Childs, Hank
2012-01-01
With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
NASA Astrophysics Data System (ADS)
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.
2018-03-01
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.
Progress in Computational Simulation of Earthquakes
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert
2006-01-01
GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V
2010-06-01
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Mosby, Matthew; Matouš, Karel
2015-12-01
Three-dimensional simulations capable of resolving the large range of spatial scales, from the failure-zone thickness up to the size of the representative unit cell, in damage mechanics problems of particle reinforced adhesives are presented. We show that resolving this wide range of scales in complex three-dimensional heterogeneous morphologies is essential in order to apprehend fracture characteristics, such as strength, fracture toughness and shape of the softening profile. Moreover, we show that computations that resolve essential physical length scales capture the particle size-effect in fracture toughness, for example. In the vein of image-based computational materials science, we construct statistically optimal unit cells containing hundreds to thousands of particles. We show that these statistically representative unit cells are capable of capturing the first- and second-order probability functions of a given data-source with better accuracy than traditional inclusion packing techniques. In order to accomplish these large computations, we use a parallel multiscale cohesive formulation and extend it to finite strains including damage mechanics. The high-performance parallel computational framework is executed on up to 1024 processing cores. A mesh convergence and a representative unit cell study are performed. Quantifying the complex damage patterns in simulations consisting of tens of millions of computational cells and millions of highly nonlinear equations requires data-mining the parallel simulations, and we propose two damage metrics to quantify the damage patterns. A detailed study of volume fraction and filler size on the macroscopic traction-separation response of heterogeneous adhesives is presented.
NASA Astrophysics Data System (ADS)
Breuillard, H.; Matteini, L.; Argall, M. R.; Sahraoui, F.; Andriopoulou, M.; Le Contel, O.; Retinò, A.; Mirioni, L.; Huang, S. Y.; Gershman, D. J.; Ergun, R. E.; Wilder, F. D.; Goodrich, K. A.; Ahmadi, N.; Yordanova, E.; Vaivads, A.; Turner, D. L.; Khotyaintsev, Yu. V.; Graham, D. B.; Lindqvist, P.-A.; Chasapis, A.; Burch, J. L.; Torbert, R. B.; Russell, C. T.; Magnes, W.; Strangeway, R. J.; Plaschke, F.; Moore, T. E.; Giles, B. L.; Paterson, W. R.; Pollock, C. J.; Lavraud, B.; Fuselier, S. A.; Cohen, I. J.
2018-06-01
The Earth’s magnetosheath, which is characterized by highly turbulent fluctuations, is usually divided into two regions of different properties as a function of the angle between the interplanetary magnetic field and the shock normal. In this study, we make use of high-time resolution instruments on board the Magnetospheric MultiScale spacecraft to determine and compare the properties of subsolar magnetosheath turbulence in both regions, i.e., downstream of the quasi-parallel and quasi-perpendicular bow shocks. In particular, we take advantage of the unprecedented temporal resolution of the Fast Plasma Investigation instrument to show the density fluctuations down to sub-ion scales for the first time. We show that the nature of turbulence is highly compressible down to electron scales, particularly in the quasi-parallel magnetosheath. In this region, the magnetic turbulence also shows an inertial (Kolmogorov-like) range, indicating that the fluctuations are not formed locally, in contrast with the quasi-perpendicular magnetosheath. We also show that the electromagnetic turbulence is dominated by electric fluctuations at sub-ion scales (f > 1 Hz) and that magnetic and electric spectra steepen at the largest-electron scale. The latter indicates a change in the nature of turbulence at electron scales. Finally, we show that the electric fluctuations around the electron gyrofrequency are mostly parallel in the quasi-perpendicular magnetosheath, where intense whistlers are observed. This result suggests that energy dissipation, plasma heating, and acceleration might be driven by intense electrostatic parallel structures/waves, which can be linked to whistler waves.
NASA Astrophysics Data System (ADS)
Khebbab, Mohamed; Feliachi, Mouloud; El Hadi Latreche, Mohamed
2018-03-01
In this present paper, a simulation of eddy current non-destructive testing (EC NDT) on unidirectional carbon fiber reinforced polymer is performed; for this magneto-dynamic formulation in term of magnetic vector potential is solved using finite element heterogeneous multi-scale method (FE HMM). FE HMM has as goal to compute the homogenized solution without calculating the homogenized tensor explicitly, the solution is based only on the physical characteristic known in micro domain. This feature is well adapted to EC NDT to evaluate defect in carbon composite material in microscopic scale, where the defect detection is performed by coil impedance measurement; the measurement value is intimately linked to material characteristic in microscopic level. Based on this, our model can handle different defects such as: cracks, inclusion, internal electrical conductivity changes, heterogeneities, etc. The simulation results were compared with the solution obtained with homogenized material using mixture law, a good agreement was found.
NASA Astrophysics Data System (ADS)
lai, W.; Steinke, R. C.; Ogden, F. L.
2013-12-01
Physics-based watershed models are useful tools for hydrologic studies, water resources management and economic analyses in the contexts of climate, land-use, and water-use changes. This poster presents development of a physics-based, high-resolution, distributed water resources model suitable for simulating large watersheds in a massively parallel computing environment. Developing this model is one of the objectives of the NSF EPSCoR RII Track II CI-WATER project, which is joint between Wyoming and Utah. The model, which we call ADHydro, is aimed at simulating important processes in the Rocky Mountain west, includes: rainfall and infiltration, snowfall and snowmelt in complex terrain, vegetation and evapotranspiration, soil heat flux and freezing, overland flow, channel flow, groundwater flow and water management. The ADHydro model uses the explicit finite volume method to solve PDEs for 2D overland flow, 2D saturated groundwater flow coupled to 1D channel flow. The model has a quasi-3D formulation that couples 2D overland flow and 2D saturated groundwater flow using the 1D Talbot-Ogden finite water-content infiltration and redistribution model. This eliminates difficulties in solving the highly nonlinear 3D Richards equation, while the finite volume Talbot-Ogden infiltration solution is computationally efficient, guaranteed to conserve mass, and allows simulation of the effect of near-surface groundwater tables on runoff generation. The process-level components of the model are being individually tested and validated. The model as a whole will be tested on the Green River basin in Wyoming and ultimately applied to the entire Upper Colorado River basin. ADHydro development has necessitated development of tools for large-scale watershed modeling, including open-source workflow steps to extract hydromorphological information from GIS data, integrate hydrometeorological and water management forcing input, and post-processing and visualization of large output data sets. The ADHydro model will be coupled with relevant components of the NOAH-MP land surface scheme and the WRF mesoscale meteorological model. Model objectives include well documented Application Programming Interfaces (APIs) to facilitate modifications and additions by others. We will release the model as open-source in 2014 and begin establishing a users' community.
MMS Observations of Parallel Electric Fields During a Quasi-Perpendicular Bow Shock Crossing
NASA Astrophysics Data System (ADS)
Goodrich, K.; Schwartz, S. J.; Ergun, R.; Wilder, F. D.; Holmes, J.; Burch, J. L.; Gershman, D. J.; Giles, B. L.; Khotyaintsev, Y. V.; Le Contel, O.; Lindqvist, P. A.; Strangeway, R. J.; Russell, C.; Torbert, R. B.
2016-12-01
Previous observations of the terrestrial bow shock have frequently shown large-amplitude fluctuations in the parallel electric field. These parallel electric fields are seen as both nonlinear solitary structures, such as double layers and electron phase-space holes, and short-wavelength waves, which can reach amplitudes greater than 100 mV/m. The Magnetospheric Multi-Scale (MMS) Mission has crossed the Earth's bow shock more than 200 times. The parallel electric field signatures observed in these crossings are seen in very discrete packets and evolve over time scales of less than a second, indicating the presence of a wealth of kinetic-scale activity. The high time resolution of the Fast Particle Instrument (FPI) available on MMS offers greater detail of the kinetic-scale physics that occur at bow shocks than ever before, allowing greater insight into the overall effect of these observed electric fields. We present a characterization of these parallel electric fields found in a single bow shock event and how it reflects the kinetic-scale activity that can occur at the terrestrial bow shock.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Multi-scale modeling of multi-component reactive transport in geothermal aquifers
NASA Astrophysics Data System (ADS)
Nick, Hamidreza M.; Raoof, Amir; Wolf, Karl-Heinz; Bruhn, David
2014-05-01
In deep geothermal systems heat and chemical stresses can cause physical alterations, which may have a significant effect on flow and reaction rates. As a consequence it will lead to changes in permeability and porosity of the formations due to mineral precipitation and dissolution. Large-scale modeling of reactive transport in such systems is still challenging. A large area of uncertainty is the way in which the pore-scale information controlling the flow and reaction will behave at a larger scale. A possible choice is to use constitutive relationships relating, for example the permeability and porosity evolutions to the change in the pore geometry. While determining such relationships through laboratory experiments may be limited, pore-network modeling provides an alternative solution. In this work, we introduce a new workflow in which a hybrid Finite-Element Finite-Volume method [1,2] and a pore network modeling approach [3] are employed. Using the pore-scale model, relevant constitutive relations are developed. These relations are then embedded in the continuum-scale model. This approach enables us to study non-isothermal reactive transport in porous media while accounting for micro-scale features under realistic conditions. The performance and applicability of the proposed model is explored for different flow and reaction regimes. References: 1. Matthäi, S.K., et al.: Simulation of solute transport through fractured rock: a higher-order accurate finite-element finite-volume method permitting large time steps. Transport in porous media 83.2 (2010): 289-318. 2. Nick, H.M., et al.: Reactive dispersive contaminant transport in coastal aquifers: Numerical simulation of a reactive Henry problem. Journal of contaminant hydrology 145 (2012), 90-104. 3. Raoof A., et al.: PoreFlow: A Complex pore-network model for simulation of reactive transport in variably saturated porous media, Computers & Geosciences, 61, (2013), 160-174.
NASA Astrophysics Data System (ADS)
Hussein, Rafid M.; Chandrashekhara, K.
2017-11-01
A multi-scale modeling approach is presented to simulate and validate thermo-oxidation shrinkage and cracking damage of a high temperature polymer composite. The multi-scale approach investigates coupled transient diffusion-reaction and static structural at macro- to micro-scale. The micro-scale shrinkage deformation and cracking damage are simulated and validated using 2D and 3D simulations. Localized shrinkage displacement boundary conditions for the micro-scale simulations are determined from the respective meso- and macro-scale simulations, conducted for a cross-ply laminate. The meso-scale geometrical domain and the micro-scale geometry and mesh are developed using the object oriented finite element (OOF). The macro-scale shrinkage and weight loss are measured using unidirectional coupons and used to build the macro-shrinkage model. The cross-ply coupons are used to validate the macro-shrinkage model by the shrinkage profiles acquired using scanning electron images at the cracked surface. The macro-shrinkage model deformation shows a discrepancy when the micro-scale image-based cracking is computed. The local maximum shrinkage strain is assumed to be 13 times the maximum macro-shrinkage strain of 2.5 × 10-5, upon which the discrepancy is minimized. The microcrack damage of the composite is modeled using a static elastic analysis with extended finite element and cohesive surfaces by considering the modulus spatial evolution. The 3D shrinkage displacements are fed to the model using node-wise boundary/domain conditions of the respective oxidized region. Microcrack simulation results: length, meander, and opening are closely matched to the crack in the area of interest for the scanning electron images.
NASA Astrophysics Data System (ADS)
Sourbier, F.; Operto, S.; Virieux, J.
2006-12-01
We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.
Multi-petascale highly efficient parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Summers, R.M.; Wong, M.K.; Boucheron, E.A.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Large-scale tidal effect on redshift-space power spectrum in a finite-volume survey
NASA Astrophysics Data System (ADS)
Akitsu, Kazuyuki; Takada, Masahiro; Li, Yin
2017-04-01
Long-wavelength matter inhomogeneities contain cleaner information on the nature of primordial perturbations as well as the physics of the early Universe. The large-scale coherent overdensity and tidal force, not directly observable for a finite-volume galaxy survey, are both related to the Hessian of large-scale gravitational potential and therefore are of equal importance. We show that the coherent tidal force causes a homogeneous anisotropic distortion of the observed distribution of galaxies in all three directions, perpendicular and parallel to the line-of-sight direction. This effect mimics the redshift-space distortion signal of galaxy peculiar velocities, as well as a distortion by the Alcock-Paczynski effect. We quantify its impact on the redshift-space power spectrum to the leading order, and discuss its importance for ongoing and upcoming galaxy surveys.
Saito, Makina; Masuda, Ryo; Yoda, Yoshitaka; Seto, Makoto
2017-10-02
We developed a multi-line time-domain interferometry (TDI) system using 14.4 keV Mössbauer gamma rays with natural energy widths of 4.66 neV from 57 Fe nuclei excited using synchrotron radiation. Electron density fluctuations can be detected at unique lengths ranging from 0.1 nm to a few nm on time scales from several nanoseconds to the sub-microsecond order by quasi-elastic gamma-ray scattering (QGS) experiments using multi-line TDI. In this report, we generalize the established expression for a time spectrum measured using an identical single-line gamma-ray emitter pair to the case of a nonidentical pair of multi-line gamma-ray emitters by considering the finite energy width of the incident synchrotron radiation. The expression obtained illustrates the unique characteristics of multi-line TDI systems, where the finite incident energy width and use of a nonidentical emitter pair produces further information on faster sub-picosecond-scale dynamics in addition to the nanosecond dynamics; this was demonstrated experimentally. A normalized intermediate scattering function was extracted from the spectrum and its relaxation form was determined for a relaxation time of the order of 1 μs, even for relatively large momentum transfer of ~31 nm -1 . The multi-line TDI method produces a microscopic relaxation picture more rapidly and accurately than conventional single-line TDI.
NASA Astrophysics Data System (ADS)
Soons, Joris; Dirckx, Joris; Steele, Charles; Puria, Sunil
2015-12-01
A multi-scale finite element (FE) model of the mouse cochlea, based on its anatomy and material properties is presented. The important feature in the model is a lattice of 400 Y-shaped structures in the longitudinal direction, each formed by Deiters cells, phalangeal processes and outer hair cells (OHC). OHC somatic motility is modeled by an expansion force proportional to the shear on the stereocilia, which in turn is proportional to the pressure difference between the scala vestibule and scala tympani. Basilar membrane (BM) and reticular lamina (RL) velocity compare qualitatively very well with recent in vivo measurements in guinea pig [2]. Compared to the BM, the RL is shown to have higher amplification and a shift to higher frequencies. This comes naturally from the realistic Y-shaped cell organization without tectorial membrane tuning.
2004-10-01
MONITORING AGENCY NAME(S) AND ADDRESS(ES) Defense Advanced Research Projects Agency AFRL/IFTC 3701 North Fairfax Drive...Scalable Parallel Libraries for Large-Scale Concurrent Applications," Technical Report UCRL -JC-109251, Lawrence Livermore National Laboratory
Leung, Chung Ming; Wang, Ya; Chen, Wusi
2016-11-01
In this letter, the airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively magnetized formation of 6 magnets to explore enhanced power density. In particular, the magnet array was positioned in parallel to the trajectory of the tip coil within its tip deflection span. The finite element simulations of the magnetic flux density and induced voltages at an open circuit condition were studied to find the maximum number of alternatively magnetized magnets that was required for the proposed energy harvester. Experimental results showed that the energy harvester with a pair of 6 alternatively magnetized linear magnet arrays was able to generate an induced voltage (V o ) of 20 V, with an open circuit condition, and 475 mW, under a 30 Ω optimal resistance load operating with the wind speed (U) at 7 m/s and a natural bending frequency of 3.54 Hz. Compared to the traditional electromagnetic energy harvester with a single magnet moving through a coil, the proposed energy harvester, containing multi-pole magnets and parallel array motion, enables the moving coil to accumulate a stronger magnetic flux in each period of the swinging motion. In addition to the comparison made with the airfoil-based piezoelectric energy harvester of the same size, our proposed electromagnetic energy harvester generates 11 times more power output, which is more suitable for high-power-density energy harvesting applications at regions with low environmental frequency.
NASA Astrophysics Data System (ADS)
Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.
2014-12-01
The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
NASA Astrophysics Data System (ADS)
Konduri, Aditya
Many natural and engineering systems are governed by nonlinear partial differential equations (PDEs) which result in a multiscale phenomena, e.g. turbulent flows. Numerical simulations of these problems are computationally very expensive and demand for extreme levels of parallelism. At realistic conditions, simulations are being carried out on massively parallel computers with hundreds of thousands of processing elements (PEs). It has been observed that communication between PEs as well as their synchronization at these extreme scales take up a significant portion of the total simulation time and result in poor scalability of codes. This issue is likely to pose a bottleneck in scalability of codes on future Exascale systems. In this work, we propose an asynchronous computing algorithm based on widely used finite difference methods to solve PDEs in which synchronization between PEs due to communication is relaxed at a mathematical level. We show that while stability is conserved when schemes are used asynchronously, accuracy is greatly degraded. Since message arrivals at PEs are random processes, so is the behavior of the error. We propose a new statistical framework in which we show that average errors drop always to first-order regardless of the original scheme. We propose new asynchrony-tolerant schemes that maintain accuracy when synchronization is relaxed. The quality of the solution is shown to depend, not only on the physical phenomena and numerical schemes, but also on the characteristics of the computing machine. A novel algorithm using remote memory access communications has been developed to demonstrate excellent scalability of the method for large-scale computing. Finally, we present a path to extend this method in solving complex multi-scale problems on Exascale machines.
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
Evolutionary potential games on lattices
NASA Astrophysics Data System (ADS)
Szabó, György; Borsos, István
2016-04-01
Game theory provides a general mathematical background to study the effect of pair interactions and evolutionary rules on the macroscopic behavior of multi-player games where players with a finite number of strategies may represent a wide scale of biological objects, human individuals, or even their associations. In these systems the interactions are characterized by matrices that can be decomposed into elementary matrices (games) and classified into four types. The concept of decomposition helps the identification of potential games and also the evaluation of the potential that plays a crucial role in the determination of the preferred Nash equilibrium, and defines the Boltzmann distribution towards which these systems evolve for suitable types of dynamical rules. This survey draws parallel between the potential games and the kinetic Ising type models which are investigated for a wide scale of connectivity structures. We discuss briefly the applicability of the tools and concepts of statistical physics and thermodynamics. Additionally the general features of ordering phenomena, phase transitions and slow relaxations are outlined and applied to evolutionary games. The discussion extends to games with three or more strategies. Finally we discuss what happens when the system is weakly driven out of the "equilibrium state" by adding non-potential components representing games of cyclic dominance.
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
A new conformal absorbing boundary condition for finite element meshes and parallelization of FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, A.; Volakis, J. L.; Nguyen, J.; Nurnberger, M.; Ross, D.
1993-01-01
Some of the progress toward the development and parallelization of an improved version of the finite element code FEMATS is described. This is a finite element code for computing the scattering by arbitrarily shaped three dimensional surfaces composite scatterers. The following tasks were worked on during the report period: (1) new absorbing boundary conditions (ABC's) for truncating the finite element mesh; (2) mixed mesh termination schemes; (3) hierarchical elements and multigridding; (4) parallelization; and (5) various modeling enhancements (antenna feeds, anisotropy, and higher order GIBC).
NASA Astrophysics Data System (ADS)
Gassmöller, Rene; Bangerth, Wolfgang
2016-04-01
Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
NASA Astrophysics Data System (ADS)
Mapakshi, N. K.; Chang, J.; Nakshatrala, K. B.
2018-04-01
Mathematical models for flow through porous media typically enjoy the so-called maximum principles, which place bounds on the pressure field. It is highly desirable to preserve these bounds on the pressure field in predictive numerical simulations, that is, one needs to satisfy discrete maximum principles (DMP). Unfortunately, many of the existing formulations for flow through porous media models do not satisfy DMP. This paper presents a robust, scalable numerical formulation based on variational inequalities (VI), to model non-linear flows through heterogeneous, anisotropic porous media without violating DMP. VI is an optimization technique that places bounds on the numerical solutions of partial differential equations. To crystallize the ideas, a modification to Darcy equations by taking into account pressure-dependent viscosity will be discretized using the lowest-order Raviart-Thomas (RT0) and Variational Multi-scale (VMS) finite element formulations. It will be shown that these formulations violate DMP, and, in fact, these violations increase with an increase in anisotropy. It will be shown that the proposed VI-based formulation provides a viable route to enforce DMP. Moreover, it will be shown that the proposed formulation is scalable, and can work with any numerical discretization and weak form. A series of numerical benchmark problems are solved to demonstrate the effects of heterogeneity, anisotropy and non-linearity on DMP violations under the two chosen formulations (RT0 and VMS), and that of non-linearity on solver convergence for the proposed VI-based formulation. Parallel scalability on modern computational platforms will be illustrated through strong-scaling studies, which will prove the efficiency of the proposed formulation in a parallel setting. Algorithmic scalability as the problem size is scaled up will be demonstrated through novel static-scaling studies. The performed static-scaling studies can serve as a guide for users to be able to select an appropriate discretization for a given problem size.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method formore » $$\\mathscr{O}(N)$$ Kohn–Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw–Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw–Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. Here, we further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect $$\\mathscr{O}(N)$$ scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.« less
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; ...
2017-12-07
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method formore » $$\\mathscr{O}(N)$$ Kohn–Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw–Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw–Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. Here, we further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect $$\\mathscr{O}(N)$$ scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.« less
Multi-scale and multi-physics simulations using the multi-fluid plasma model
2017-04-25
small The simulation uses 512 second-order elements Bz = 1.0, Te = Ti = 0.01, ui = ue = 0 ne = ni = 1.0 + e−10(x−6) 2 Baboolal, Math . and Comp. Sim. 55...DISTRIBUTION Clearance No. 17211 23 / 31 SUMMARY The blended finite element method (BFEM) is presented DG spatial discretization with explicit Runge...Kutta (i+, n) CG spatial discretization with implicit Crank-Nicolson (e−, fileds) DG captures shocks and discontinuities CG is efficient and robust for
Han, Zhenyu; Sun, Shouzheng; Fu, Hongya; Fu, Yunzhong
2017-01-01
Automated fiber placement (AFP) process includes a variety of energy forms and multi-scale effects. This contribution proposes a novel multi-scale low-entropy method aiming at optimizing processing parameters in an AFP process, where multi-scale effect, energy consumption, energy utilization efficiency and mechanical properties of micro-system could be taken into account synthetically. Taking a carbon fiber/epoxy prepreg as an example, mechanical properties of macro–meso–scale are obtained by Finite Element Method (FEM). A multi-scale energy transfer model is then established to input the macroscopic results into the microscopic system as its boundary condition, which can communicate with different scales. Furthermore, microscopic characteristics, mainly micro-scale adsorption energy, diffusion coefficient entropy–enthalpy values, are calculated under different processing parameters based on molecular dynamics method. Low-entropy region is then obtained in terms of the interrelation among entropy–enthalpy values, microscopic mechanical properties (interface adsorbability and matrix fluidity) and processing parameters to guarantee better fluidity, stronger adsorption, lower energy consumption and higher energy quality collaboratively. Finally, nine groups of experiments are carried out to verify the validity of the simulation results. The results show that the low-entropy optimization method can reduce void content effectively, and further improve the mechanical properties of laminates. PMID:28869520
Han, Zhenyu; Sun, Shouzheng; Fu, Hongya; Fu, Yunzhong
2017-09-03
Automated fiber placement (AFP) process includes a variety of energy forms and multi-scale effects. This contribution proposes a novel multi-scale low-entropy method aiming at optimizing processing parameters in an AFP process, where multi-scale effect, energy consumption, energy utilization efficiency and mechanical properties of micro-system could be taken into account synthetically. Taking a carbon fiber/epoxy prepreg as an example, mechanical properties of macro-meso-scale are obtained by Finite Element Method (FEM). A multi-scale energy transfer model is then established to input the macroscopic results into the microscopic system as its boundary condition, which can communicate with different scales. Furthermore, microscopic characteristics, mainly micro-scale adsorption energy, diffusion coefficient entropy-enthalpy values, are calculated under different processing parameters based on molecular dynamics method. Low-entropy region is then obtained in terms of the interrelation among entropy-enthalpy values, microscopic mechanical properties (interface adsorbability and matrix fluidity) and processing parameters to guarantee better fluidity, stronger adsorption, lower energy consumption and higher energy quality collaboratively. Finally, nine groups of experiments are carried out to verify the validity of the simulation results. The results show that the low-entropy optimization method can reduce void content effectively, and further improve the mechanical properties of laminates.
Regional-scale calculation of the LS factor using parallel processing
NASA Astrophysics Data System (ADS)
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Conformal anomaly of some 2-d Z (n) models
NASA Astrophysics Data System (ADS)
William, Peter
1991-01-01
We describe a numerical calculation of the conformal anomaly in the case of some two-dimensional statistical models undergoing a second-order phase transition, utilizing a recently developed method to compute the partition function exactly. This computation is carried out on a massively parallel CM2 machine, using the finite size scaling behaviour of the free energy.
NASA Astrophysics Data System (ADS)
Sheng, Lizeng
The dissertation focuses on one of the major research needs in the area of adaptive/intelligent/smart structures, the development and application of finite element analysis and genetic algorithms for optimal design of large-scale adaptive structures. We first review some basic concepts in finite element method and genetic algorithms, along with the research on smart structures. Then we propose a solution methodology for solving a critical problem in the design of a next generation of large-scale adaptive structures---optimal placements of a large number of actuators to control thermal deformations. After briefly reviewing the three most frequently used general approaches to derive a finite element formulation, the dissertation presents techniques associated with general shell finite element analysis using flat triangular laminated composite elements. The element used here has three nodes and eighteen degrees of freedom and is obtained by combining a triangular membrane element and a triangular plate bending element. The element includes the coupling effect between membrane deformation and bending deformation. The membrane element is derived from the linear strain triangular element using Cook's transformation. The discrete Kirchhoff triangular (DKT) element is used as the plate bending element. For completeness, a complete derivation of the DKT is presented. Geometrically nonlinear finite element formulation is derived for the analysis of adaptive structures under the combined thermal and electrical loads. Next, we solve the optimization problems of placing a large number of piezoelectric actuators to control thermal distortions in a large mirror in the presence of four different thermal loads. We then extend this to a multi-objective optimization problem of determining only one set of piezoelectric actuator locations that can be used to control the deformation in the same mirror under the action of any one of the four thermal loads. A series of genetic algorithms, GA Version 1, 2 and 3, were developed to find the optimal locations of piezoelectric actuators from the order of 1021 ˜ 1056 candidate placements. Introducing a variable population approach, we improve the flexibility of selection operation in genetic algorithms. Incorporating mutation and hill climbing into micro-genetic algorithms, we are able to develop a more efficient genetic algorithm. Through extensive numerical experiments, we find that the design search space for the optimal placements of a large number of actuators is highly multi-modal and that the most distinct nature of genetic algorithms is their robustness. They give results that are random but with only a slight variability. The genetic algorithms can be used to get adequate solution using a limited number of evaluations. To get the highest quality solution, multiple runs including different random seed generators are necessary. The investigation time can be significantly reduced using a very coarse grain parallel computing. Overall, the methodology of using finite element analysis and genetic algorithm optimization provides a robust solution approach for the challenging problem of optimal placements of a large number of actuators in the design of next generation of adaptive structures.
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions
NASA Astrophysics Data System (ADS)
Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey
2014-01-01
The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
NASA Technical Reports Server (NTRS)
Datta, Anubhav; Johnson, Wayne R.
2009-01-01
This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj
2006-01-01
A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia; ...
2015-11-01
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
NASA Astrophysics Data System (ADS)
Hager, Robert; Yoon, E. S.; Ku, S.; D'Azevedo, E. F.; Worley, P. H.; Chang, C. S.
2015-11-01
We describe the implementation, and application of a time-dependent, fully nonlinear multi-species Fokker-Planck-Landau collision operator based on the single-species work of Yoon and Chang [Phys. Plasmas 21, 032503 (2014)] in the full-function gyrokinetic particle-in-cell codes XGC1 [Ku et al., Nucl. Fusion 49, 115021 (2009)] and XGCa. XGC simulations include the pedestal and scrape-off layer, where significant deviations of the particle distribution function from a Maxwellian can occur. Thus, in order to describe collisional effects on neoclassical and turbulence physics accurately, the use of a non-linear collision operator is a necessity. Our collision operator is based on a finite volume method using the velocity-space distribution functions sampled from the marker particles. Since the same fine configuration space mesh is used for collisions and the Poisson solver, the workload due to collisions can be comparable to or larger than the workload due to particle motion. We demonstrate that computing time spent on collisions can be kept affordable by applying advanced parallelization strategies while conserving mass, momentum, and energy to reasonable accuracy. We also show results of production scale XGCa simulations in the H-mode pedestal and compare to conventional theory. Work supported by US DOE OFES and OASCR.
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
NASA Astrophysics Data System (ADS)
Zhang, Chao; Curiel-Sosa, Jose L.; Bui, Tinh Quoc
2018-04-01
In many engineering applications, 3D braided composites are designed for primary loading-bearing structures, and they are frequently subjected to multi-axial loading conditions during service. In this paper, a unit-cell based finite element model is developed for assessment of mechanical behavior of 3D braided composites under different biaxial tension loadings. To predict the damage initiation and evolution of braiding yarns and matrix in the unit-cell, we thus propose an anisotropic damage model based on Murakami damage theory in conjunction with Hashin failure criteria and maximum stress criteria. To attain exact stress ratio, force loading mode of periodic boundary conditions which never been attempted before is first executed to the unit-cell model to apply the biaxial tension loadings. The biaxial mechanical behaviors, such as the stress distribution, tensile modulus and tensile strength are analyzed and discussed. The damage development of 3D braided composites under typical biaxial tension loadings is simulated and the damage mechanisms are revealed in the simulation process. The present study generally provides a new reference to the meso-scale finite element analysis (FEA) of multi-axial mechanical behavior of other textile composites.
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
GPU and APU computations of Finite Time Lyapunov Exponent fields
NASA Astrophysics Data System (ADS)
Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
2012-03-01
We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.
NASA Technical Reports Server (NTRS)
Le, Guan; Wang, Yongli; Slavin, James A.; Strangeway, Robert J.
2007-01-01
Space Technology 5 (ST5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that meso-scale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of - 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are approx. 1 min for meso-scale currents and approx. 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Finite entanglement entropy and spectral dimension in quantum gravity
NASA Astrophysics Data System (ADS)
Arzano, Michele; Calcagni, Gianluca
2017-12-01
What are the conditions on a field theoretic model leading to a finite entanglement entropy density? We prove two very general results: (1) Ultraviolet finiteness of a theory does not guarantee finiteness of the entropy density; (2) If the spectral dimension of the spatial boundary across which the entropy is calculated is non-negative at all scales, then the entanglement entropy cannot be finite. These conclusions, which we verify in several examples, negatively affect all quantum-gravity models, since their spectral dimension is always positive. Possible ways out are considered, including abandoning the definition of the entanglement entropy in terms of the boundary return probability or admitting an analytic continuation (not a regularization) of the usual definition. In the second case, one can get a finite entanglement entropy density in multi-fractional theories and causal dynamical triangulations.
User's Guide for ENSAERO_FE Parallel Finite Element Solver
NASA Technical Reports Server (NTRS)
Eldred, Lloyd B.; Guruswamy, Guru P.
1999-01-01
A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
A blended continuous–discontinuous finite element method for solving the multi-fluid plasma model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sousa, E.M., E-mail: sousae@uw.edu; Shumlak, U., E-mail: shumlak@uw.edu
The multi-fluid plasma model represents electrons, multiple ion species, and multiple neutral species as separate fluids that interact through short-range collisions and long-range electromagnetic fields. The model spans a large range of temporal and spatial scales, which renders the model stiff and presents numerical challenges. To address the large range of timescales, a blended continuous and discontinuous Galerkin method is proposed, where the massive ion and neutral species are modeled using an explicit discontinuous Galerkin method while the electrons and electromagnetic fields are modeled using an implicit continuous Galerkin method. This approach is able to capture large-gradient ion and neutralmore » physics like shock formation, while resolving high-frequency electron dynamics in a computationally efficient manner. The details of the Blended Finite Element Method (BFEM) are presented. The numerical method is benchmarked for accuracy and tested using two-fluid one-dimensional soliton problem and electromagnetic shock problem. The results are compared to conventional finite volume and finite element methods, and demonstrate that the BFEM is particularly effective in resolving physics in stiff problems involving realistic physical parameters, including realistic electron mass and speed of light. The benefit is illustrated by computing a three-fluid plasma application that demonstrates species separation in multi-component plasmas.« less
2015-12-28
Masoud Anahid, Mahendra K. Samal , and Somnath Ghosh. Dwell fatigue crack nucleation model based on crystal plasticity finite element simulations of...induced crack nucleation in polycrystals. Model. Simul. Mater. Sci. Eng., 17, 064009. 19. Anahid, M., Samal , M. K. & Ghosh, S. (2011). Dwell fatigue...Jour. Plas., 24:428–454, 2008. 4. M. Anahid, M. K. Samal , and S. Ghosh. Dwell fatigue crack nucleation model based on crystal plasticity finite
NASA Technical Reports Server (NTRS)
Larour, Eric; Schiermeier, John E.; Seroussi, Helene; Morlinghem, Mathieu
2013-01-01
In order to have the capability to use satellite data from its own missions to inform future sea-level rise projections, JPL needed a full-fledged ice-sheet/iceshelf flow model, capable of modeling the mass balance of Antarctica and Greenland into the near future. ISSM was developed with such a goal in mind, as a massively parallelized, multi-purpose finite-element framework dedicated to ice-sheet modeling. ISSM features unstructured meshes (Tria in 2D, and Penta in 3D) along with corresponding finite elements for both types of meshes. Each finite element can carry out diagnostic, prognostic, transient, thermal 3D, surface, and bed slope simulations. Anisotropic meshing enables adaptation of meshes to a certain metric, and the 2D Shelfy-Stream, 3D Blatter/Pattyn, and 3D Full-Stokes formulations capture the bulk of the ice-flow physics. These elements can be coupled together, based on the Arlequin method, so that on a large scale model such as Antarctica, each type of finite element is used in the most efficient manner. For each finite element referenced above, ISSM implements an adjoint. This adjoint can be used to carry out model inversions of unknown model parameters, typically ice rheology and basal drag at the ice/bedrock interface, using a metric such as the observed InSAR surface velocity. This data assimilation capability is crucial to allow spinning up of ice flow models using available satellite data. ISSM relies on the PETSc library for its vectors, matrices, and solvers. This allows ISSM to run efficiently on any parallel platform, whether shared or distrib- ISSM: Ice Sheet System Model NASA's Jet Propulsion Laboratory, Pasadena, California uted. It can run on the largest clusters, and is fully scalable. This allows ISSM to tackle models the size of continents. ISSM is embedded into MATLAB and Python, both open scientific platforms. This improves its outreach within the science community. It is entirely written in C/C++, which gives it flexibility in its design, and the power/speed that C/C++ allows. ISSM is svn (subversion) hosted, on a JPL repository, to facilitate its development and maintenance. ISSM can also model propagation of rifts using contact mechanics and mesh splitting, and can interface to the Dakota software. To carry out sensitivity analysis, mesh partitioning algorithms are available, based on the Scotch, Chaco, and Metis partitioners that ensure equal area mesh partitions can be done, which are then usable for sampling and local reliability methods.
NASA Astrophysics Data System (ADS)
Zaghi, S.
2014-07-01
OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Space Technology 5 Multi-Point Observations of Temporal Variability of Field-Aligned Currents
NASA Technical Reports Server (NTRS)
Le, Guan; Wang, Yongli; Slavin, James A.; Strangeway, Robert J.
2008-01-01
Space Technology 5 (ST5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that meso-scale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of approximately 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are approximately 1 min for meso-scale currents and approximately 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Numerical study of supersonic combustors by multi-block grids with mismatched interfaces
NASA Technical Reports Server (NTRS)
Moon, Young J.
1990-01-01
A three dimensional, finite rate chemistry, Navier-Stokes code was extended to a multi-block code with mismatched interface for practical calculations of supersonic combustors. To ensure global conservation, a conservative algorithm was used for the treatment of mismatched interfaces. The extended code was checked against one test case, i.e., a generic supersonic combustor with transverse fuel injection, examining solution accuracy, convergence, and local mass flux error. After testing, the code was used to simulate the chemically reacting flow fields in a scramjet combustor with parallel fuel injectors (unswept and swept ramps). Computational results were compared with experimental shadowgraph and pressure measurements. Fuel-air mixing characteristics of the unswept and swept ramps were compared and investigated.
The Blended Finite Element Method for Multi-fluid Plasma Modeling
2016-07-01
Briefing Charts 3. DATES COVERED (From - To) 07 June 2016 - 01 July 2016 4. TITLE AND SUBTITLE The Blended Finite Element Method for Multi-fluid Plasma...BLENDED FINITE ELEMENT METHOD FOR MULTI-FLUID PLASMA MODELING Éder M. Sousa1, Uri Shumlak2 1ERC INC., IN-SPACE PROPULSION BRANCH (RQRS) AIR FORCE RESEARCH...MULTI-FLUID PLASMA MODEL 2 BLENDED FINITE ELEMENT METHOD Blended Finite Element Method Nodal Continuous Galerkin Modal Discontinuous Galerkin Model
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
NASA Technical Reports Server (NTRS)
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
NASA Astrophysics Data System (ADS)
Arshadi, Amir
Image-based simulation of complex materials is a very important tool for understanding their mechanical behavior and an effective tool for successful design of composite materials. In this thesis an image-based multi-scale finite element approach is developed to predict the mechanical properties of asphalt mixtures. In this approach the "up-scaling" and homogenization of each scale to the next is critically designed to improve accuracy. In addition to this multi-scale efficiency, this study introduces an approach for consideration of particle contacts at each of the scales in which mineral particles exist. One of the most important pavement distresses which seriously affects the pavement performance is fatigue cracking. As this cracking generally takes place in the binder phase of the asphalt mixture, the binder fatigue behavior is assumed to be one of the main factors influencing the overall pavement fatigue performance. It is also known that aggregate gradation, mixture volumetric properties, and filler type and concentration can affect damage initiation and progression in the asphalt mixtures. This study was conducted to develop a tool to characterize the damage properties of the asphalt mixtures at all scales. In the present study the Viscoelastic continuum damage model is implemented into the well-known finite element software ABAQUS via the user material subroutine (UMAT) in order to simulate the state of damage in the binder phase under the repeated uniaxial sinusoidal loading. The inputs are based on the experimentally derived measurements for the binder properties. For the scales of mastic and mortar, the artificially 2-Dimensional images of mastic and mortar scales were generated and used to characterize the properties of those scales. Finally, the 2D scanned images of asphalt mixtures are used to study the asphalt mixture fatigue behavior under loading. In order to validate the proposed model, the experimental test results and the simulation results were compared. Indirect tensile fatigue tests were conducted on asphalt mixture samples. A comparison between experimental results and the results from simulation shows that the model developed in this study is capable of predicting the effect of asphalt binder properties and aggregate micro-structure on mechanical behavior of asphalt concrete under loading.
Xiao, Li; Cai, Qin; Li, Zhilin; Zhao, Hongkai; Luo, Ray
2014-11-25
A multi-scale framework is proposed for more realistic molecular dynamics simulations in continuum solvent models by coupling a molecular mechanics treatment of solute with a fluid mechanics treatment of solvent. This article reports our initial efforts to formulate the physical concepts necessary for coupling the two mechanics and develop a 3D numerical algorithm to simulate the solvent fluid via the Navier-Stokes equation. The numerical algorithm was validated with multiple test cases. The validation shows that the algorithm is effective and stable, with observed accuracy consistent with our design.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pask, J E; Sukumar, N; Guney, M
2011-02-28
Over the course of the past two decades, quantum mechanical calculations have emerged as a key component of modern materials research. However, the solution of the required quantum mechanical equations is a formidable task and this has severely limited the range of materials systems which can be investigated by such accurate, quantum mechanical means. The current state of the art for large-scale quantum simulations is the planewave (PW) method, as implemented in now ubiquitous VASP, ABINIT, and QBox codes, among many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points inmore » space, and in which every basis function overlaps every other at every point, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires substantial nonlocal communications in parallel implementations, placing critical limits on scalability. In recent years, real-space methods such as finite-differences (FD) and finite-elements (FE) have been developed to address these deficiencies by reformulating the required quantum mechanical equations in a strictly local representation. However, while addressing both resolution and parallel-communications problems, such local real-space approaches have been plagued by one key disadvantage relative to planewaves: excessive degrees of freedom (grid points, basis functions) needed to achieve the required accuracies. And so, despite critical limitations, the PW method remains the standard today. In this work, we show for the first time that this key remaining disadvantage of real-space methods can in fact be overcome: by building known atomic physics into the solution process using modern partition-of-unity (PU) techniques in finite element analysis. Indeed, our results show order-of-magnitude reductions in basis size relative to state-of-the-art planewave based methods. The method developed here is completely general, applicable to any crystal symmetry and to both metals and insulators alike. We have developed and implemented a full self-consistent Kohn-Sham method, including both total energies and forces for molecular dynamics, and developed a full MPI parallel implementation for large-scale calculations. We have applied the method to the gamut of physical systems, from simple insulating systems with light atoms to complex d- and f-electron systems, requiring large numbers of atomic-orbital enrichments. In every case, the new PU FE method attained the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the current state-of-the-art PW method. Finally, our initial MPI implementation has shown excellent parallel scaling of the most time-critical parts of the code up to 1728 processors, with clear indications of what will be required to achieve comparable scaling for the rest. Having shown that the key remaining disadvantage of real-space methods can in fact be overcome, the work has attracted significant attention: with sixteen invited talks, both domestic and international, so far; two papers published and another in preparation; and three new university and/or national laboratory collaborations, securing external funding to pursue a number of related research directions. Having demonstrated the proof of principle, work now centers on the necessary extensions and optimizations required to bring the prototype method and code delivered here to production applications.« less
Self-organized criticality in asymmetric exclusion model with noise for freeway traffic
NASA Astrophysics Data System (ADS)
Nagatani, Takashi
1995-02-01
The one-dimensional asymmetric simple-exclusion model with open boundaries for parallel update is extended to take into account temporary stopping of particles. The model presents the traffic flow on a highway with temporary deceleration of cars. Introducing temporary stopping into the asymmetric simple-exclusion model drives the system asymptotically into a steady state exhibiting a self-organized criticality. In the self-organized critical state, start-stop waves (or traffic jams) appear with various sizes (or lifetimes). The typical interval < s>between consecutive jams scales as < s> ≃ Lv with v = 0.51 ± 0.05 where L is the system size. It is shown that the cumulative jam-interval distribution Ns( L) satisfies the finite-size scaling form ( Ns( L) ≃ L- vf( s/ Lv). Also, the typical lifetime
Numerical Modelling of Foundation Slabs with use of Schur Complement Method
NASA Astrophysics Data System (ADS)
Koktan, Jiří; Brožovský, Jiří
2017-10-01
The paper discusses numerical modelling of foundation slabs with use of advanced numerical approaches, which are suitable for parallel processing. The solution is based on the Finite Element Method with the slab-type elements. The subsoil is modelled with use of Winklertype contact model (as an alternative a multi-parameter model can be used). The proposed modelling approach uses the Schur Complement method to speed-up the computations of the problem. The method is based on a special division of the analyzed model to several substructures. It adds some complexity to the numerical procedures, especially when subsoil models are used inside the finite element method solution. In other hand, this method makes possible a fast solution of large models but it introduces further problems to the process. Thus, the main aim of this paper is to verify that such method can be successfully used for this type of problem. The most suitable finite elements will be discussed, there will be also discussion related to finite element mesh and limitations of its construction for such problem. The core approaches of the implementation of the Schur Complement Method for this type of the problem will be also presented. The proposed approach was implemented in the form of a computer program, which will be also briefly introduced. There will be also presented results of example computations, which prove the speed-up of the solution - there will be shown important speed-up of solution even in the case of on-parallel processing and the ability of bypass size limitations of numerical models with use of the discussed approach.
Multi-Probe SPM using Interference Patterns for a Parallel Nano Imaging
NASA Astrophysics Data System (ADS)
Koyama, Hirotaka; Oohira, Fumikazu; Hosogi, Maho; Hashiguchi, Gen
This paper proposes a new composition of the multi-probe using optical interference patterns for a parallel nano imaging in a large area scanning. We achieved large-scale integration with 50,000 probes fabricated with MEMS technology, and measured the optical interference patterns with CCD, which was difficult in a conventional single scanning probe. In this research, the multi-probes are made of Si3N4 by MEMS process, and, the multi-probes are joined with a Pyrex glass by an anodic bonding. We designed, fabricated, and evaluated the characteristics of the probe. In addition, we changed the probe shape to decrease the warpage of the Si3N4 probe. We used the supercritical drying to avoid stiction of the Si3N4 probe with the glass surface and fabricated 4 types of the probe shapes without stiction. We took some interference patterns by CCD and measured the position of them. We calculate the probe height using the interference displacement and compared the result with the theoretical deflection curve. As a result, these interference patterns matched the theoretical deflection curve. We found that this multi-probe chip using interference patterns is effective in measurement for a parallel nano imaging.
Practical aspects of prestack depth migration with finite differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.; Oldfield, R.A.; Womble, D.E.
1997-07-01
Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Discontinuous Galerkin Finite Element Method for Parabolic Problems
NASA Technical Reports Server (NTRS)
Kaneko, Hideaki; Bey, Kim S.; Hou, Gene J. W.
2004-01-01
In this paper, we develop a time and its corresponding spatial discretization scheme, based upon the assumption of a certain weak singularity of parallel ut(t) parallel Lz(omega) = parallel ut parallel2, for the discontinuous Galerkin finite element method for one-dimensional parabolic problems. Optimal convergence rates in both time and spatial variables are obtained. A discussion of automatic time-step control method is also included.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Adaptive implicit-explicit and parallel element-by-element iteration schemes
NASA Technical Reports Server (NTRS)
Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.
1989-01-01
Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
Grid-Enabled Quantitative Analysis of Breast Cancer
2009-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...pilot study to utilize large scale parallel Grid computing to harness the nationwide cluster infrastructure for optimization of medical image ... analysis parameters. Additionally, we investigated the use of cutting edge dataanalysis/ mining techniques as applied to Ultrasound, FFDM, and DCE-MRI Breast
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
NASA Astrophysics Data System (ADS)
Chang, Y.; Hung, S.; Kuo, B.; Kuochen, H.
2012-12-01
Taiwan is one of the archetypical places for studying the active orogenic process in the world, where the Luzon arc has obliquely collided into the southwest China continental margin since 5 Ma ago. Because of the lack of convincing evidence for the structure in the lithospheric mantle and at even greater depths, several competing models have been proposed for the Taiwan mountain-building process. With the deployment of ocean-bottom seismometers (OBSs) on the seafloor around Taiwan from the TAIGER (TAiwan Integrated GEodynamic Research) and IES seismic experiments, the aperture of the seismic network is greatly extended to improve the depth resolution of tomographic imaging, which is critical to illuminate the nature of the arc-continent collision and accretion in Taiwan. In this study, we use relative travel-time residuals between a collection of teleseismic body wave arrivals to tomographically image the velocity structure beneath Taiwan. In addition to those from common distant earthquakes observed across an array of stations, we take advantage of dense seismicity in the vicinity of Taiwan and the source and receiver reciprocity to augment the data coverage from clustered earthquakes recorded by global stations. As waveforms are dependent of source mechanisms, we carry out the cluster analysis to group the phase arrivals with similar waveforms into clusters and simultaneously determine relative travel-time anomalies in the same cluster accurately by a cross correlation method. The combination of these two datasets would particularly enhance the resolvability of the tomographic models offshore of eastern Taiwan, where the two subduction systems of opposite polarity are taking place and have primarily shaped the present tectonic framework of Taiwan. On the other hand, our inversion adopts an innovation that invokes wavelet-based, multi-scale parameterization and finite-frequency theory. Not only does this approach make full use of frequency-dependent travel-time data providing different, but complementary sensitivity to velocity heterogeneity, but it also objectively addresses the intrinsically multi-scale characters of unevenly distributed data which yields the model with spatially-varying, data-adaptive resolution. Besides, we employ a parallelized singular value decomposition algorithm to directly solve for the resolution matrix and point spread functions (PSF). While the spatial distribution of a PSF is considered as the probability density function of multivariate normal distribution, we employ the principal component analysis (PCA) to estimate the lengths and directions of the principal axes of the PSF distribution, used for quantitative assessment of the resolvable scale-length and degree of smearing of the model and guidance to interpret the robust and trustworthy features in the resolved models.
Peridynamic Multiscale Finite Element Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Costa, Timothy; Bond, Stephen D.; Littlewood, David John
The problem of computing quantum-accurate design-scale solutions to mechanics problems is rich with applications and serves as the background to modern multiscale science research. The prob- lem can be broken into component problems comprised of communicating across adjacent scales, which when strung together create a pipeline for information to travel from quantum scales to design scales. Traditionally, this involves connections between a) quantum electronic structure calculations and molecular dynamics and between b) molecular dynamics and local partial differ- ential equation models at the design scale. The second step, b), is particularly challenging since the appropriate scales of molecular dynamic andmore » local partial differential equation models do not overlap. The peridynamic model for continuum mechanics provides an advantage in this endeavor, as the basic equations of peridynamics are valid at a wide range of scales limiting from the classical partial differential equation models valid at the design scale to the scale of molecular dynamics. In this work we focus on the development of multiscale finite element methods for the peridynamic model, in an effort to create a mathematically consistent channel for microscale information to travel from the upper limits of the molecular dynamics scale to the design scale. In particular, we first develop a Nonlocal Multiscale Finite Element Method which solves the peridynamic model at multiple scales to include microscale information at the coarse-scale. We then consider a method that solves a fine-scale peridynamic model to build element-support basis functions for a coarse- scale local partial differential equation model, called the Mixed Locality Multiscale Finite Element Method. Given decades of research and development into finite element codes for the local partial differential equation models of continuum mechanics there is a strong desire to couple local and nonlocal models to leverage the speed and state of the art of local models with the flexibility and accuracy of the nonlocal peridynamic model. In the mixed locality method this coupling occurs across scales, so that the nonlocal model can be used to communicate material heterogeneity at scales inappropriate to local partial differential equation models. Additionally, the computational burden of the weak form of the peridynamic model is reduced dramatically by only requiring that the model be solved on local patches of the simulation domain which may be computed in parallel, taking advantage of the heterogeneous nature of next generation computing platforms. Addition- ally, we present a novel Galerkin framework, the 'Ambulant Galerkin Method', which represents a first step towards a unified mathematical analysis of local and nonlocal multiscale finite element methods, and whose future extension will allow the analysis of multiscale finite element methods that mix models across scales under certain assumptions of the consistency of those models.« less
ParaView visualization of Abaqus output on the mechanical deformation of complex microstructures
NASA Astrophysics Data System (ADS)
Liu, Qingbin; Li, Jiang; Liu, Jie
2017-02-01
Abaqus® is a popular software suite for finite element analysis. It delivers linear and nonlinear analyses of mechanical and fluid dynamics, includes multi-body system and multi-physics coupling. However, the visualization capability of Abaqus using its CAE module is limited. Models from microtomography have extremely complicated structures, and datasets of Abaqus output are huge, requiring a visualization tool more powerful than Abaqus/CAE. We convert Abaqus output into the XML-based VTK format by developing a Python script and then using ParaView to visualize the results. Such capabilities as volume rendering, tensor glyphs, superior animation and other filters allow ParaView to offer excellent visualizing manifestations. ParaView's parallel visualization makes it possible to visualize very big data. To support full parallel visualization, the Python script achieves data partitioning by reorganizing all nodes, elements and the corresponding results on those nodes and elements. The data partition scheme minimizes data redundancy and works efficiently. Given its good readability and extendibility, the script can be extended to the processing of more different problems in Abaqus. We share the script with Abaqus users on GitHub.
NASA Astrophysics Data System (ADS)
Lin, Shian-Jiann; Harris, Lucas; Chen, Jan-Huey; Zhao, Ming
2014-05-01
A multi-scale High-Resolution Atmosphere Model (HiRAM) is being developed at NOAA/Geophysical Fluid Dynamics Laboratory. The model's dynamical framework is the non-hydrostatic extension of the vertically Lagrangian finite-volume dynamical core (Lin 2004, Monthly Wea. Rev.) constructed on a stretchable (via Schmidt transformation) cubed-sphere grid. Physical parametrizations originally designed for IPCC-type climate predictions are in the process of being modified and made more "scale-aware", in an effort to make the model suitable for multi-scale weather-climate applications, with horizontal resolution ranging from 1 km (near the target high-resolution region) to as low as 400 km (near the antipodal point). One of the main goals of this development is to enable simulation of high impact weather phenomena (such as tornadoes, thunderstorms, category-5 hurricanes) within an IPCC-class climate modeling system previously thought impossible. We will present preliminary results, covering a very wide spectrum of temporal-spatial scales, ranging from simulation of tornado genesis (hours), Madden-Julian Oscillations (intra-seasonal), topical cyclones (seasonal), to Quasi Biennial Oscillations (intra-decadal), using the same global multi-scale modeling system.
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
NASA Astrophysics Data System (ADS)
Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku
2015-01-01
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.
García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine
2015-01-01
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted. PMID:25680098
MAGNETIC BRAIDING AND PARALLEL ELECTRIC FIELDS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilmot-Smith, A. L.; Hornig, G.; Pontin, D. I.
2009-05-10
The braiding of the solar coronal magnetic field via photospheric motions-with subsequent relaxation and magnetic reconnection-is one of the most widely debated ideas of solar physics. We readdress the theory in light of developments in three-dimensional magnetic reconnection theory. It is known that the integrated parallel electric field along field lines is the key quantity determining the rate of reconnection, in contrast with the two-dimensional case where the electric field itself is the important quantity. We demonstrate that this difference becomes crucial for sufficiently complex magnetic field structures. A numerical method is used to relax a braided magnetic field towardmore » an ideal force-free equilibrium; the field is found to remain smooth throughout the relaxation, with only large-scale current structures. However, a highly filamentary integrated parallel current structure with extremely short length-scales is found in the field, with the associated gradients intensifying during the relaxation process. An analytical model is developed to show that, in a coronal situation, the length scales associated with the integrated parallel current structures will rapidly decrease with increasing complexity, or degree of braiding, of the magnetic field. Analysis shows the decrease in these length scales will, for any finite resistivity, eventually become inconsistent with the stability of the coronal field. Thus the inevitable consequence of the magnetic braiding process is a loss of equilibrium of the magnetic field, probably via magnetic reconnection events.« less
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Electromagnetic Design of a Magnetically-Coupled Spatial Power Combiner
NASA Technical Reports Server (NTRS)
Bulcha, B.; Cataldo, G.; Stevenson, T. R.; U-Yen, K.; Moseley, S. H.; Wollack, E. J.
2017-01-01
The design of a two-dimensional beam-combining network employing a parallel-plate superconducting waveguide with a mono-crystalline silicon dielectric is presented. This novel beam-combining network structure employs an array of magnetically coupled antenna elements to achieve high coupling efficiency and full sampling of the intensity distribution while avoiding diffractive losses in the multi-mode region defined by the parallel-plate waveguide. These attributes enable the structures use in realizing compact far-infrared spectrometers for astrophysical and instrumentation applications. When configured with a suitable corporate-feed power-combiner, this fully sampled array can be used to realize a low-sidelobe apodized response without incurring a reduction in coupling efficiency. To control undesired reflections over a wide range of angles in the finite-sized parallel-plate waveguide region, a wideband meta-material electromagnetic absorber structure is implemented. This adiabatic structure absorbs greater than 99 of the power over the 1.7:1 operational band at angles ranging from normal (0 degree) to near parallel (180 degree) incidence. Design, simulations, and application of the device will be presented.
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-01-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911
Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-05-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.
A multi-scale Q1/P0 approach to langrangian shock hydrodynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shashkov, Mikhail; Love, Edward; Scovazzi, Guglielmo
A new multi-scale, stabilized method for Q1/P0 finite element computations of Lagrangian shock hydrodynamics is presented. Instabilities (of hourglass type) are controlled by a stabilizing operator derived using the variational multi-scale analysis paradigm. The resulting stabilizing term takes the form of a pressure correction. With respect to currently implemented hourglass control approaches, the novelty of the method resides in its residual-based character. The stabilizing residual has a definite physical meaning, since it embeds a discrete form of the Clausius-Duhem inequality. Effectively, the proposed stabilization samples and acts to counter the production of entropy due to numerical instabilities. The proposed techniquemore » is applicable to materials with no shear strength, for which there exists a caloric equation of state. The stabilization operator is incorporated into a mid-point, predictor/multi-corrector time integration algorithm, which conserves mass, momentum and total energy. Encouraging numerical results in the context of compressible gas dynamics confirm the potential of the method.« less
Numerical evaluation of multi-loop integrals for arbitrary kinematics with SecDec 2.0
NASA Astrophysics Data System (ADS)
Borowka, Sophia; Carter, Jonathon; Heinrich, Gudrun
2013-02-01
We present the program SecDec 2.0, which contains various new features. First, it allows the numerical evaluation of multi-loop integrals with no restriction on the kinematics. Dimensionally regulated ultraviolet and infrared singularities are isolated via sector decomposition, while threshold singularities are handled by a deformation of the integration contour in the complex plane. As an application, we present numerical results for various massive two-loop four-point diagrams. SecDec 2.0 also contains new useful features for the calculation of more general parameter integrals, related for example to phase space integrals. Program summaryProgram title: SecDec 2.0 Catalogue identifier: AEIR_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIR_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 156829 No. of bytes in distributed program, including test data, etc.: 2137907 Distribution format: tar.gz Programming language: Wolfram Mathematica, Perl, Fortran/C++. Computer: From a single PC to a cluster, depending on the problem. Operating system: Unix, Linux. RAM: Depending on the complexity of the problem Classification: 4.4, 5, 11.1. Catalogue identifier of previous version: AEIR_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182(2011)1566 Does the new version supersede the previous version?: Yes Nature of problem: Extraction of ultraviolet and infrared singularities from parametric integrals appearing in higher order perturbative calculations in gauge theories. Numerical integration in the presence of integrable singularities (e.g., kinematic thresholds). Solution method: Algebraic extraction of singularities in dimensional regularization using iterated sector decomposition. This leads to a Laurent series in the dimensional regularization parameter ɛ, where the coefficients are finite integrals over the unit hypercube. Those integrals are evaluated numerically by Monte Carlo integration. The integrable singularities are handled by choosing a suitable integration contour in the complex plane, in an automated way. Reasons for new version: In the previous version the calculation of multi-scale integrals was restricted to the Euclidean region. Now multi-loop integrals with arbitrary physical kinematics can be evaluated. Another major improvement is the possibility of full parallelization. Summary of revisions: No restriction on the kinematics for multi-loop integrals. The integrand can be constructed from the topological cuts of the diagram. Possibility of full parallelization. Numerical integration of multi-loop integrals written in C++ rather than Fortran. Possibility to loop over ranges of parameters. Restrictions: Depending on the complexity of the problem, limited by memory and CPU time. The restriction that multi-scale integrals could only be evaluated at Euclidean points is superseded in version 2.0. Running time: Between a few minutes and several days, depending on the complexity of the problem. Test runs provided take only seconds.
Lattice dynamics calculations based on density-functional perturbation theory in real space
NASA Astrophysics Data System (ADS)
Shang, Honghui; Carbogno, Christian; Rinke, Patrick; Scheffler, Matthias
2017-06-01
A real-space formalism for density-functional perturbation theory (DFPT) is derived and applied for the computation of harmonic vibrational properties in molecules and solids. The practical implementation using numeric atom-centered orbitals as basis functions is demonstrated exemplarily for the all-electron Fritz Haber Institute ab initio molecular simulations (FHI-aims) package. The convergence of the calculations with respect to numerical parameters is carefully investigated and a systematic comparison with finite-difference approaches is performed both for finite (molecules) and extended (periodic) systems. Finally, the scaling tests and scalability tests on massively parallel computer systems demonstrate the computational efficiency.
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E.
2013-05-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill.; Figure 1 -- Architecture.
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.
2013-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved 'clock time' speedups in fusing datasets on our own compute nodes and in the public Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed 'near' the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill. SciReduce Architecture
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
Grayver, Alexander V.; Kolev, Tzanio V.
2015-11-01
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayver, Alexander V.; Kolev, Tzanio V.
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Constitutive Model Calibration via Autonomous Multiaxial Experimentation (Postprint)
2016-09-17
test machine. Experimental data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain...data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain conditions. Optimization methods...be used directly in finite element simulations of more complex geometries. Keywords Axial/torsional experimentation • Plasticity • Constitutive model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schunert, Sebastian; Schwen, Daniel; Ghassemi, Pedram
This work presents a multi-physics, multi-scale approach to modeling the Transient Test Reactor (TREAT) currently prepared for restart at the Idaho National Laboratory. TREAT fuel is made up of microscopic fuel grains (r ˜ 20µm) dispersed in a graphite matrix. The novelty of this work is in coupling a binary collision Monte-Carlo (BCMC) model to the Finite Element based code Moose for solving a microsopic heat-conduction problem whose driving source is provided by the BCMC model tracking fission fragment energy deposition. This microscopic model is driven by a transient, engineering scale neutronics model coupled to an adiabatic heating model. Themore » macroscopic model provides local power densities and neutron energy spectra to the microscpic model. Currently, no feedback from the microscopic to the macroscopic model is considered. TREAT transient 15 is used to exemplify the capabilities of the multi-physics, multi-scale model, and it is found that the average fuel grain temperature differs from the average graphite temperature by 80 K despite the low-power transient. The large temperature difference has strong implications on the Doppler feedback a potential LEU TREAT core would see, and it underpins the need for multi-physics, multi-scale modeling of a TREAT LEU core.« less
Xiao, Li; Cai, Qin; Li, Zhilin; Zhao, Hongkai; Luo, Ray
2014-01-01
A multi-scale framework is proposed for more realistic molecular dynamics simulations in continuum solvent models by coupling a molecular mechanics treatment of solute with a fluid mechanics treatment of solvent. This article reports our initial efforts to formulate the physical concepts necessary for coupling the two mechanics and develop a 3D numerical algorithm to simulate the solvent fluid via the Navier-Stokes equation. The numerical algorithm was validated with multiple test cases. The validation shows that the algorithm is effective and stable, with observed accuracy consistent with our design. PMID:25404761
NASA Astrophysics Data System (ADS)
Zhao, Jifeng; Kontsevoi, Oleg Y.; Xiong, Wei; Smith, Jacob
2017-05-01
In this work, a multi-scale computational framework has been established in order to investigate, refine and validate constitutive behaviors in the context of the Gurson-Tvergaard-Needleman (GTN) void mechanics model. The eXtended Finite Element Method (XFEM) has been implemented in order to (1) develop statistical volume elements (SVE) of a matrix material with subscale inclusions and (2) to simulate the multi-void nucleation process due to interface debonding between the matrix and particle phases. Our analyses strongly suggest that under low stress triaxiality the nucleation rate of the voids f˙ can be well described by a normal distribution function with respect to the matrix equivalent stress (σe), as opposed to that proposed (σbar + 1 / 3σkk) in the original form of the single void GTN model. The modified form of the multi-void nucleation model has been validated based on a series of numerical experiments with different loading conditions, material properties, particle shape/size and spatial distributions. The utilization of XFEM allows for an invariant finite element mesh to represent varying microstructures, which implies suitability for drastically reducing complexity in generating the finite element discretizations for large stochastic arrays of microstructure configurations. The modified form of the multi-void nucleation model is further applied to study high strength steels by incorporating first principles calculations. The necessity of using a phenomenological interface separation law has been fully eliminated and replaced by the physics-based cohesive relationship obtained from Density Functional Theory (DFT) calculations in order to provide an accurate macroscopic material response.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kasemer, Matthew; Quey, Romain; Dawson, Paul
Discussed is a computational study of the influence of the microstructure’s geometric morphology on the yield strength and ductility of Ti-6Al-4V. Uniaxial tension tests were conducted on physical specimens to determine the macroscopic yield strength and ductility of two microstructural variations (mill annealed and β annealed) to establish comparisons of macroscopic properties. A multi-experimental approach was utilized to gather two dimensional and three dimensional data, which were used to inform the construction of representative β annealed polycrystals. A highly parallelized crystal plasticity finite element framework was employed to model the deformation response of the generated polycrystals subjected to uniaxial tension.more » To gauge the macroscopic response’s sensitivity to the morphology of the geometry, the key geometrical features - namely the number of high temperature β phase grains, α phase colonies, and size of remnant secondary β phase lamellae - were altered systematically in a suite of simulations. Both single phase and dual phase aggregates were studied. Presented are the calculated yield strengths and ductilities, and the resulting trends as functions of geometric parameters are examined in light of the heterogeneity in deformation at the crystal scale.« less
Capabilities and performance of the new generation ice-sheet model Elmer/Ice
NASA Astrophysics Data System (ADS)
Gagliardini, O.; Zwinger, T.; Durand, G.; Favier, L.; de Fleurian, B.; Gillet-chaulet, F.; Seddik, H.; Greve, R.; Mallinen, M.; Martin, C.; Raback, P.; Ruokolainen, J.; Schäfer, M.; Thies, J.
2012-12-01
Since the Fourth IPCC Assessment Report, and its conclusion about the inability of ice-sheet flow models to forecast the current increase of polar ice sheet discharge and associated contribution to sea-level rise, a huge development effort has been undertaken by the glaciological community. All around the world, models have been improved and, interestingly, a significant number of new ice-sheet models have emerged. Among them, the parallel finite-element model Elmer/Ice (based on the open-source multi-physics code Elmer) was one of the first full-Stokes models used to make projections of the future of the whole Greenland ice sheet for the coming two centuries. Originally developed to solve dedicated local ice flow problems of high mechanical and physical complexity, Elmer/Ice has today reached the maturity to solve larger scale problems, earning the status of an ice-sheet model. In this presentation, we summarise the almost 10 years of development performed by different groups. We present the components already included in Elmer/Ice, its numerical performance, selected applications, as well as developments planed for the future.
Capabilities and performance of Elmer/Ice, a new generation ice-sheet model
NASA Astrophysics Data System (ADS)
Gagliardini, O.; Zwinger, T.; Gillet-Chaulet, F.; Durand, G.; Favier, L.; de Fleurian, B.; Greve, R.; Malinen, M.; Martín, C.; Råback, P.; Ruokolainen, J.; Sacchettini, M.; Schäfer, M.; Seddik, H.; Thies, J.
2013-03-01
The Fourth IPCC Assessment Report concluded that ice-sheet flow models are unable to forecast the current increase of polar ice sheet discharge and the associated contribution to sea-level rise. Since then, the glaciological community has undertaken a huge effort to develop and improve a new generation of ice-flow models, and as a result, a significant number of new ice-sheet models have emerged. Among them is the parallel finite-element model Elmer/Ice, based on the open-source multi-physics code Elmer. It was one of the first full-Stokes models used to make projections for the evolution of the whole Greenland ice sheet for the coming two centuries. Originally developed to solve local ice flow problems of high mechanical and physical complexity, Elmer/Ice has today reached the maturity to solve larger scale problems, earning the status of an ice-sheet model. Here, we summarise almost 10 yr of development performed by different groups. We present the components already included in Elmer/Ice, its numerical performance, selected applications, as well as developments planned for the future.
Fiber-optically sensorized composite wing
NASA Astrophysics Data System (ADS)
Costa, Joannes M.; Black, Richard J.; Moslehi, Behzad; Oblea, Levy; Patel, Rona; Sotoudeh, Vahid; Abouzeida, Essam; Quinones, Vladimir; Gowayed, Yasser; Soobramaney, Paul; Flowers, George
2014-04-01
Electromagnetic interference (EMI) immune and light-weight, fiber-optic sensor based Structural Health Monitoring (SHM) will find increasing application in aerospace structures ranging from aircraft wings to jet engine vanes. Intelligent Fiber Optic Systems Corporation (IFOS) has been developing multi-functional fiber Bragg grating (FBG) sensor systems including parallel processing FBG interrogators combined with advanced signal processing for SHM, structural state sensing and load monitoring applications. This paper reports work with Auburn University on embedding and testing FBG sensor arrays in a quarter scale model of a T38 composite wing. The wing was designed and manufactured using fabric reinforced polymer matrix composites. FBG sensors were embedded under the top layer of the composite. Their positions were chosen based on strain maps determined by finite element analysis. Static and dynamic testing confirmed expected response from the FBGs. The demonstrated technology has the potential to be further developed into an autonomous onboard system to perform load monitoring, SHM and Non-Destructive Evaluation (NDE) of composite aerospace structures (wings and rotorcraft blades). This platform technology could also be applied to flight testing of morphing and aero-elastic control surfaces.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, Ruel H.; Imbriale, William A.; Jacobi, Nathan; Liewer, Paulett C.; Lockhart, Thomas G.; Lyzenga, Gregory A.; Lyons, James R.; Manshadi, Farzin; Patterson, Jean E.
1988-01-01
A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18).
NASA Astrophysics Data System (ADS)
Ravi, Sathish Kumar; Gawad, Jerzy; Seefeldt, Marc; Van Bael, Albert; Roose, Dirk
2017-10-01
A numerical multi-scale model is being developed to predict the anisotropic macroscopic material response of multi-phase steel. The embedded microstructure is given by a meso-scale Representative Volume Element (RVE), which holds the most relevant features like phase distribution, grain orientation, morphology etc., in sufficient detail to describe the multi-phase behavior of the material. A Finite Element (FE) mesh of the RVE is constructed using statistical information from individual phases such as grain size distribution and ODF. The material response of the RVE is obtained for selected loading/deformation modes through numerical FE simulations in Abaqus. For the elasto-plastic response of the individual grains, single crystal plasticity based plastic potential functions are proposed as Abaqus material definitions. The plastic potential functions are derived using the Facet method for individual phases in the microstructure at the level of single grains. The proposed method is a new modeling framework and the results presented in terms of macroscopic flow curves are based on the building blocks of the approach, while the model would eventually facilitate the construction of an anisotropic yield locus of the underlying multi-phase microstructure derived from a crystal plasticity based framework.
3D CSEM inversion based on goal-oriented adaptive finite element method
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.
2016-12-01
We present a parallel 3D frequency domain controlled-source electromagnetic inversion code name MARE3DEM. Non-linear inversion of observed data is performed with the Occam variant of regularized Gauss-Newton optimization. The forward operator is based on the goal-oriented finite element method that efficiently calculates the responses and sensitivity kernels in parallel using a data decomposition scheme where independent modeling tasks contain different frequencies and subsets of the transmitters and receivers. To accommodate complex 3D conductivity variation with high flexibility and precision, we adopt the dual-grid approach where the forward mesh conforms to the inversion parameter grid and is adaptively refined until the forward solution converges to the desired accuracy. This dual-grid approach is memory efficient, since the inverse parameter grid remains independent from fine meshing generated around the transmitter and receivers by the adaptive finite element method. Besides, the unstructured inverse mesh efficiently handles multiple scale structures and allows for fine-scale model parameters within the region of interest. Our mesh generation engine keeps track of the refinement hierarchy so that the map of conductivity and sensitivity kernel between the forward and inverse mesh is retained. We employ the adjoint-reciprocity method to calculate the sensitivity kernels which establish a linear relationship between changes in the conductivity model and changes in the modeled responses. Our code uses a direcy solver for the linear systems, so the adjoint problem is efficiently computed by re-using the factorization from the primary problem. Further computational efficiency and scalability is obtained in the regularized Gauss-Newton portion of the inversion using parallel dense matrix-matrix multiplication and matrix factorization routines implemented with the ScaLAPACK library. We show the scalability, reliability and the potential of the algorithm to deal with complex geological scenarios by applying it to the inversion of synthetic marine controlled source EM data generated for a complex 3D offshore model with significant seafloor topography.
Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, Brian; Manipon, Gerald; Hua, Hook; Fetzer, Eric
2014-05-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map-reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in a hybrid Cloud (private eucalyptus & public Amazon). Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept and prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be perform
NASA Astrophysics Data System (ADS)
Mirza, Arshad M.; Masood, W.
2011-12-01
Nonlinear equations governing the dynamics of finite amplitude drift-ion acoustic-waves are derived by taking into account sheared ion flows parallel and perpendicular to the ambient magnetic field in a quantum magnetoplasma comprised of electrons and ions. It is shown that stationary solution of the nonlinear equations can be represented in the form of a tripolar vortex for specific profiles of the equilibrium sheared flows. The tripolar vortices are, however, observed to form on very short scales in dense quantum plasmas. The relevance of the present investigation with regard to dense astrophysical environments is also pointed out.
NASA Astrophysics Data System (ADS)
Barajas-Solano, D. A.; Tartakovsky, A. M.
2017-12-01
We present a multiresolution method for the numerical simulation of flow and reactive transport in porous, heterogeneous media, based on the hybrid Multiscale Finite Volume (h-MsFV) algorithm. The h-MsFV algorithm allows us to couple high-resolution (fine scale) flow and transport models with lower resolution (coarse) models to locally refine both spatial resolution and transport models. The fine scale problem is decomposed into various "local'' problems solved independently in parallel and coordinated via a "global'' problem. This global problem is then coupled with the coarse model to strictly ensure domain-wide coarse-scale mass conservation. The proposed method provides an alternative to adaptive mesh refinement (AMR), due to its capacity to rapidly refine spatial resolution beyond what's possible with state-of-the-art AMR techniques, and the capability to locally swap transport models. We illustrate our method by applying it to groundwater flow and reactive transport of multiple species.
NASA Astrophysics Data System (ADS)
Kees, C. E.; Miller, C. T.; Dimakopoulos, A.; Farthing, M.
2016-12-01
The last decade has seen an expansion in the development and application of 3D free surface flow models in the context of environmental simulation. These models are based primarily on the combination of effective algorithms, namely level set and volume-of-fluid methods, with high-performance, parallel computing. These models are still computationally expensive and suitable primarily when high-fidelity modeling near structures is required. While most research on algorithms and implementations has been conducted in the context of finite volume methods, recent work has extended a class of level set schemes to finite element methods on unstructured methods. This work considers models of three-phase flow in domains containing air, water, and granular phases. These multi-phase continuum mechanical formulations show great promise for applications such as analysis of coastal and riverine structures. This work will consider formulations proposed in the literature over the last decade as well as new formulations derived using the thermodynamically constrained averaging theory, an approach to deriving and closing macroscale continuum models for multi-phase and multi-component processes. The target applications require the ability to simulate wave breaking and structure over-topping, particularly fully three-dimensional, non-hydrostatic flows that drive these phenomena. A conservative level set scheme suitable for higher-order finite element methods is used to describe the air/water phase interaction. The interaction of these air/water flows with granular materials, such as sand and rubble, must also be modeled. The range of granular media dynamics targeted including flow and wave transmision through the solid media as well as erosion and deposition of granular media and moving bed dynamics. For the granular phase we consider volume- and time-averaged continuum mechanical formulations that are discretized with the finite element method and coupled to the underlying air/water flow via operator splitting (fractional step) schemes. Particular attention will be given to verification and validation of the numerical model and important qualitative features of the numerical methods including phase conservation, wave energy dissipation, and computational efficiency in regimes of interest.
An Overview of Mesoscale Modeling Software for Energetic Materials Research
2010-03-01
12 2.9 Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ...13 Table 10. LAMMPS summary...extensive reviews, lectures and workshops are available on multiscale modeling of materials applications (76-78). • Multi-phase mixtures of
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Multi-resonant electromagnetic shunt in base isolation for vibration damping and energy harvesting
NASA Astrophysics Data System (ADS)
Pei, Yalu; Liu, Yilun; Zuo, Lei
2018-06-01
This paper investigates multi-resonant electromagnetic shunts applied to base isolation for dual-function vibration damping and energy harvesting. Two multi-mode shunt circuit configurations, namely parallel and series, are proposed and optimized based on the H2 criteria. The root-mean-square (RMS) value of the relative displacement between the base and the primary structure is minimized. Practically, this will improve the safety of base-isolated buildings subjected the broad bandwidth ground acceleration. Case studies of a base-isolated building are conducted in both the frequency and time domains to investigate the effectiveness of multi-resonant electromagnetic shunts under recorded earthquake signals. It shows that both multi-mode shunt circuits outperform traditional single mode shunt circuits by suppressing the first and the second vibration modes simultaneously. Moreover, for the same stiffness ratio, the parallel shunt circuit is more effective at harvesting energy and suppressing vibration, and can more robustly handle parameter mistuning than the series shunt circuit. Furthermore, this paper discusses experimental validation of the effectiveness of multi-resonant electromagnetic shunts for vibration damping and energy harvesting on a scaled-down base isolation system.
Finite Gyroradius Effects in the Electron Outflow of Asymmetric Magnetic Reconnection
NASA Technical Reports Server (NTRS)
Norgren, C.; Graham, D. B.; Khotyaintsev, Yu. V.; Andre, M.; Vaivads, A.; Chen, Li-Jen; Lindqvist, P.-A.; Marklund, G. T.; Ergun, R. E.; Magnes, W.;
2016-01-01
We present observations of asymmetric magnetic reconnection showing evidence of electron demagnetization in the electron outflow. The observations were made at the magnetopause by the four Magnetospheric Multiscale (MMS) spacecraft, separated by approximately 15 km. The reconnecting current sheet has negligible guide field, and all four spacecraft likely pass close to the electron diffusion region just south of the X line. In the electron outflow near the X line, all four spacecraft observe highly structured electron distributions in a region comparable to a few electron gyroradii. The distributions consist of a core with T(sub parallel) greater than T(sub perpendicular) and a nongyrotropic crescent perpendicular to the magnetic field. The crescents are associated with finite gyroradius effects of partly demagnetized electrons. These observations clearly demonstrate the manifestation of finite gyroradius effects in an electron-scale reconnection current sheet.
Parallel, adaptive finite element methods for conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.
1994-01-01
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
NASA Technical Reports Server (NTRS)
Frank, Andreas O.; Twombly, I. Alexander; Barth, Timothy J.; Smith, Jeffrey D.; Dalton, Bonnie P. (Technical Monitor)
2001-01-01
We have applied the linear elastic finite element method to compute haptic force feedback and domain deformations of soft tissue models for use in virtual reality simulators. Our results show that, for virtual object models of high-resolution 3D data (>10,000 nodes), haptic real time computations (>500 Hz) are not currently possible using traditional methods. Current research efforts are focused in the following areas: 1) efficient implementation of fully adaptive multi-resolution methods and 2) multi-resolution methods with specialized basis functions to capture the singularity at the haptic interface (point loading). To achieve real time computations, we propose parallel processing of a Jacobi preconditioned conjugate gradient method applied to a reduced system of equations resulting from surface domain decomposition. This can effectively be achieved using reconfigurable computing systems such as field programmable gate arrays (FPGA), thereby providing a flexible solution that allows for new FPGA implementations as improved algorithms become available. The resulting soft tissue simulation system would meet NASA Virtual Glovebox requirements and, at the same time, provide a generalized simulation engine for any immersive environment application, such as biomedical/surgical procedures or interactive scientific applications.
A general CFD framework for fault-resilient simulations based on multi-resolution information fusion
NASA Astrophysics Data System (ADS)
Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em
2017-10-01
We develop a general CFD framework for multi-resolution simulations to target multiscale problems but also resilience in exascale simulations, where faulty processors may lead to gappy, in space-time, simulated fields. We combine approximation theory and domain decomposition together with statistical learning techniques, e.g. coKriging, to estimate boundary conditions and minimize communications by performing independent parallel runs. To demonstrate this new simulation approach, we consider two benchmark problems. First, we solve the heat equation (a) on a small number of spatial "patches" distributed across the domain, simulated by finite differences at fine resolution and (b) on the entire domain simulated at very low resolution, thus fusing multi-resolution models to obtain the final answer. Second, we simulate the flow in a lid-driven cavity in an analogous fashion, by fusing finite difference solutions obtained with fine and low resolution assuming gappy data sets. We investigate the influence of various parameters for this framework, including the correlation kernel, the size of a buffer employed in estimating boundary conditions, the coarseness of the resolution of auxiliary data, and the communication frequency across different patches in fusing the information at different resolution levels. In addition to its robustness and resilience, the new framework can be employed to generalize previous multiscale approaches involving heterogeneous discretizations or even fundamentally different flow descriptions, e.g. in continuum-atomistic simulations.
Xu, Jingxiang; Higuchi, Yuji; Ozawa, Nobuki; Sato, Kazuhisa; Hashida, Toshiyuki; Kubo, Momoji
2017-09-20
Ni sintering in the Ni/YSZ porous anode of a solid oxide fuel cell changes the porous structure, leading to degradation. Preventing sintering and degradation during operation is a great challenge. Usually, a sintering molecular dynamics (MD) simulation model consisting of two particles on a substrate is used; however, the model cannot reflect the porous structure effect on sintering. In our previous study, a multi-nanoparticle sintering modeling method with tens of thousands of atoms revealed the effect of the particle framework and porosity on sintering. However, the method cannot reveal the effect of the particle size on sintering and the effect of sintering on the change in the porous structure. In the present study, we report a strategy to reveal them in the porous structure by using our multi-nanoparticle modeling method and a parallel large-scale multimillion-atom MD simulator. We used this method to investigate the effect of YSZ particle size and tortuosity on sintering and degradation in the Ni/YSZ anodes. Our parallel large-scale MD simulation showed that the sintering degree decreased as the YSZ particle size decreased. The gas fuel diffusion path, which reflects the overpotential, was blocked by pore coalescence during sintering. The degradation of gas diffusion performance increased as the YSZ particle size increased. Furthermore, the gas diffusion performance was quantified by a tortuosity parameter and an optimal YSZ particle size, which is equal to that of Ni, was found for good diffusion after sintering. These findings cannot be obtained by previous MD sintering studies with tens of thousands of atoms. The present parallel large-scale multimillion-atom MD simulation makes it possible to clarify the effects of the particle size and tortuosity on sintering and degradation.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
NASA Astrophysics Data System (ADS)
Averkin, Sergey N.; Gatsonis, Nikolaos A.
2018-06-01
An unstructured electrostatic Particle-In-Cell (EUPIC) method is developed on arbitrary tetrahedral grids for simulation of plasmas bounded by arbitrary geometries. The electric potential in EUPIC is obtained on cell vertices from a finite volume Multi-Point Flux Approximation of Gauss' law using the indirect dual cell with Dirichlet, Neumann and external circuit boundary conditions. The resulting matrix equation for the nodal potential is solved with a restarted generalized minimal residual method (GMRES) and an ILU(0) preconditioner algorithm, parallelized using a combination of node coloring and level scheduling approaches. The electric field on vertices is obtained using the gradient theorem applied to the indirect dual cell. The algorithms for injection, particle loading, particle motion, and particle tracking are parallelized for unstructured tetrahedral grids. The algorithms for the potential solver, electric field evaluation, loading, scatter-gather algorithms are verified using analytic solutions for test cases subject to Laplace and Poisson equations. Grid sensitivity analysis examines the L2 and L∞ norms of the relative error in potential, field, and charge density as a function of edge-averaged and volume-averaged cell size. Analysis shows second order of convergence for the potential and first order of convergence for the electric field and charge density. Temporal sensitivity analysis is performed and the momentum and energy conservation properties of the particle integrators in EUPIC are examined. The effects of cell size and timestep on heating, slowing-down and the deflection times are quantified. The heating, slowing-down and the deflection times are found to be almost linearly dependent on number of particles per cell. EUPIC simulations of current collection by cylindrical Langmuir probes in collisionless plasmas show good comparison with previous experimentally validated numerical results. These simulations were also used in a parallelization efficiency investigation. Results show that the EUPIC has efficiency of more than 80% when the simulation is performed on a single CPU from a non-uniform memory access node and the efficiency is decreasing as the number of threads further increases. The EUPIC is applied to the simulation of the multi-species plasma flow over a geometrically complex CubeSat in Low Earth Orbit. The EUPIC potential and flowfield distribution around the CubeSat exhibit features that are consistent with previous simulations over simpler geometrical bodies.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.
1999-04-01
A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
Finite element based contact analysis of radio frequency MEMs switch membrane surfaces
NASA Astrophysics Data System (ADS)
Liu, Jin-Ya; Chalivendra, Vijaya; Huang, Wenzhen
2017-10-01
Finite element simulations were performed to determine the contact behavior of radio frequency (RF) micro-electro-mechanical (MEM) switch contact surfaces under monotonic and cyclic loading conditions. Atomic force microscopy (AFM) was used to capture the topography of RF-MEM switch membranes and later they were analyzed for multi-scale regular as well as fractal structures. Frictionless, non-adhesive contact 3D finite element analysis was carried out at different length scales to investigate the contact behavior of the regular-fractal surface using an elasto-plastic material model. Dominant micro-scale regular patterns were found to significantly change the contact behavior. Contact areas mainly cluster around the regular pattern. The contribution from the fractal structure is not significant. Under cyclic loading conditions, plastic deformation in the 1st loading/unloading cycle smooth the surface. The subsequent repetitive loading/unloading cycles undergo elastic contact without changing the morphology of the contacting surfaces. The work is expected to shed light on the quality of the switch surface contact as well as the optimum design of RF MEM switch surfaces.
Nagaoka, Tomoaki; Watanabe, Soichi
2012-01-01
Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...
2016-01-26
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Particle-in-cell simulations of the critical ionization velocity effect in finite size clouds
NASA Technical Reports Server (NTRS)
Moghaddam-Taaheri, E.; Lu, G.; Goertz, C. K.; Nishikawa, K. - I.
1994-01-01
The critical ionization velocity (CIV) mechanism in a finite size cloud is studied with a series of electrostatic particle-in-cell simulations. It is observed that an initial seed ionization, produced by non-CIV mechanisms, generates a cross-field ion beam which excites a modified beam-plasma instability (MBPI) with frequency in the range of the lower hybrid frequency. The excited waves accelerate electrons along the magnetic field up to the ion drift energy that exceeds the ionization energy of the neutral atoms. The heated electrons in turn enhance the ion beam by electron-neutral impact ionization, which establishes a positive feedback loop in maintaining the CIV process. It is also found that the efficiency of the CIV mechanism depends on the finite size of the gas cloud in the following ways: (1) Along the ambient magnetic field the finite size of the cloud, L (sub parallel), restricts the growth of the fastest growing mode, with a wavelength lambda (sub m parallel), of the MBPI. The parallel electron heating at wave saturation scales approximately as (L (sub parallel)/lambda (sub m parallel)) (exp 1/2); (2) Momentum coupling between the cloud and the ambient plasma via the Alfven waves occurs as a result of the finite size of the cloud in the direction perpendicular to both the ambient magnetic field and the neutral drift. This reduces exponentially with time the relative drift between the ambient plasma and the neutrals. The timescale is inversely proportional to the Alfven velocity. (3) The transvers e charge separation field across the cloud was found to result in the modulation of the beam velocity which reduces the parallel heating of electrons and increases the transverse acceleration of electrons. (4) Some energetic electrons are lost from the cloud along the magnetic field at a rate characterized by the acoustic velocity, instead of the electron thermal velocity. The loss of energetic electrons from the cloud seems to be larger in the direction of plasma drift relative to the neutrals, where the loss rate is characterized by the neutral drift velocity. It is also shown that a factor of 4 increase in the ambient plasma density, increases the CIV ionization yield by almost 2 orders of magnitude at the end of a typical run. It is concluded that a larger ambient plasma density can result in a larger CIV yield because of (1) larger seed ion production by non-CIV mechanisms, (2) smaller Alfven velocity and hence weak momentum coupling, and (3) smaller ratio of the ion beam density to the ambient ion density, and therefore a weaker modulation of the beam velocity. The simulation results are used to interpret various chemical release experiments in space.
NASA Astrophysics Data System (ADS)
Takasaki, Koichi
This paper presents a program for the multidisciplinary optimization and identification problem of the nonlinear model of large aerospace vehicle structures. The program constructs the global matrix of the dynamic system in the time direction by the p-version finite element method (pFEM), and the basic matrix for each pFEM node in the time direction is described by a sparse matrix similarly to the static finite element problem. The algorithm used by the program does not require the Hessian matrix of the objective function and so has low memory requirements. It also has a relatively low computational cost, and is suited to parallel computation. The program was integrated as a solver module of the multidisciplinary analysis system CUMuLOUS (Computational Utility for Multidisciplinary Large scale Optimization of Undense System) which is under development by the Aerospace Research and Development Directorate (ARD) of the Japan Aerospace Exploration Agency (JAXA).
Schunck, N.; Dobaczewski, J.; Satuła, W.; ...
2017-03-27
Here, we describe the new version (v2.73y) of the code hfodd which solves the nuclear Skyrme Hartree–Fock or Skyrme Hartree–Fock–Bogolyubov problem by using the Cartesian deformed harmonic-oscillator basis. In the new version, we have implemented the following new features: (i) full proton–neutron mixing in the particle–hole channel for Skyrme functionals, (ii) the Gogny force in both particle–hole and particle–particle channels, (iii) linear multi-constraint method at finite temperature, (iv) fission toolkit including the constraint on the number of particles in the neck between two fragments, calculation of the interaction energy between fragments, and calculation of the nuclear and Coulomb energy ofmore » each fragment, (v) the new version 200d of the code hfbtho, together with an enhanced interface between HFBTHO and HFODD, (vi) parallel capabilities, significantly extended by adding several restart options for large-scale jobs, (vii) the Lipkin translational energy correction method with pairing, (viii) higher-order Lipkin particle-number corrections, (ix) interface to a program plotting single-particle energies or Routhians, (x) strong-force isospin-symmetry-breaking terms, and (xi) the Augmented Lagrangian Method for calculations with 3D constraints on angular momentum and isospin. Finally, an important bug related to the calculation of the entropy at finite temperature and several other little significant errors of the previous published version were corrected.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schunck, N.; Dobaczewski, J.; Satuła, W.
Here, we describe the new version (v2.73y) of the code hfodd which solves the nuclear Skyrme Hartree–Fock or Skyrme Hartree–Fock–Bogolyubov problem by using the Cartesian deformed harmonic-oscillator basis. In the new version, we have implemented the following new features: (i) full proton–neutron mixing in the particle–hole channel for Skyrme functionals, (ii) the Gogny force in both particle–hole and particle–particle channels, (iii) linear multi-constraint method at finite temperature, (iv) fission toolkit including the constraint on the number of particles in the neck between two fragments, calculation of the interaction energy between fragments, and calculation of the nuclear and Coulomb energy ofmore » each fragment, (v) the new version 200d of the code hfbtho, together with an enhanced interface between HFBTHO and HFODD, (vi) parallel capabilities, significantly extended by adding several restart options for large-scale jobs, (vii) the Lipkin translational energy correction method with pairing, (viii) higher-order Lipkin particle-number corrections, (ix) interface to a program plotting single-particle energies or Routhians, (x) strong-force isospin-symmetry-breaking terms, and (xi) the Augmented Lagrangian Method for calculations with 3D constraints on angular momentum and isospin. Finally, an important bug related to the calculation of the entropy at finite temperature and several other little significant errors of the previous published version were corrected.« less
A real-time multi-scale 2D Gaussian filter based on FPGA
NASA Astrophysics Data System (ADS)
Luo, Haibo; Gai, Xingqin; Chang, Zheng; Hui, Bin
2014-11-01
Multi-scale 2-D Gaussian filter has been widely used in feature extraction (e.g. SIFT, edge etc.), image segmentation, image enhancement, image noise removing, multi-scale shape description etc. However, their computational complexity remains an issue for real-time image processing systems. Aimed at this problem, we propose a framework of multi-scale 2-D Gaussian filter based on FPGA in this paper. Firstly, a full-hardware architecture based on parallel pipeline was designed to achieve high throughput rate. Secondly, in order to save some multiplier, the 2-D convolution is separated into two 1-D convolutions. Thirdly, a dedicate first in first out memory named as CAFIFO (Column Addressing FIFO) was designed to avoid the error propagating induced by spark on clock. Finally, a shared memory framework was designed to reduce memory costs. As a demonstration, we realized a 3 scales 2-D Gaussian filter on a single ALTERA Cyclone III FPGA chip. Experimental results show that, the proposed framework can computing a Multi-scales 2-D Gaussian filtering within one pixel clock period, is further suitable for real-time image processing. Moreover, the main principle can be popularized to the other operators based on convolution, such as Gabor filter, Sobel operator and so on.
NASA Technical Reports Server (NTRS)
Liu, Nan-Suey
2001-01-01
A multi-disciplinary design/analysis tool for combustion systems is critical for optimizing the low-emission, high-performance combustor design process. Based on discussions between then NASA Lewis Research Center and the jet engine companies, an industry-government team was formed in early 1995 to develop the National Combustion Code (NCC), which is an integrated system of computer codes for the design and analysis of combustion systems. NCC has advanced features that address the need to meet designer's requirements such as "assured accuracy", "fast turnaround", and "acceptable cost". The NCC development team is comprised of Allison Engine Company (Allison), CFD Research Corporation (CFDRC), GE Aircraft Engines (GEAE), NASA Glenn Research Center (LeRC), and Pratt & Whitney (P&W). The "unstructured mesh" capability and "parallel computing" are fundamental features of NCC from its inception. The NCC system is composed of a set of "elements" which includes grid generator, main flow solver, turbulence module, turbulence and chemistry interaction module, chemistry module, spray module, radiation heat transfer module, data visualization module, and a post-processor for evaluating engine performance parameters. Each element may have contributions from several team members. Such a multi-source multi-element system needs to be integrated in a way that facilitates inter-module data communication, flexibility in module selection, and ease of integration. The development of the NCC beta version was essentially completed in June 1998. Technical details of the NCC elements are given in the Reference List. Elements such as the baseline flow solver, turbulence module, and the chemistry module, have been extensively validated; and their parallel performance on large-scale parallel systems has been evaluated and optimized. However the scalar PDF module and the Spray module, as well as their coupling with the baseline flow solver, were developed in a small-scale distributed computing environment. As a result, the validation of the NCC beta version as a whole was quite limited. Current effort has been focused on the validation of the integrated code and the evaluation/optimization of its overall performance on large-scale parallel systems.
The effect of anisotropic heat transport on magnetic islands in 3-D configurations
NASA Astrophysics Data System (ADS)
Schlutt, M. G.; Hegna, C. C.
2012-08-01
An analytic theory of nonlinear pressure-induced magnetic island formation using a boundary layer analysis is presented. This theory extends previous work by including the effects of finite parallel heat transport and is applicable to general three dimensional magnetic configurations. In this work, particular attention is paid to the role of finite parallel heat conduction in the context of pressure-induced island physics. It is found that localized currents that require self-consistent deformation of the pressure profile, such as resistive interchange and bootstrap currents, are attenuated by finite parallel heat conduction when the magnetic islands are sufficiently small. However, these anisotropic effects do not change saturated island widths caused by Pfirsch-Schlüter current effects. Implications for finite pressure-induced island healing are discussed.
Individual-specific multi-scale finite element simulation of cortical bone of human proximal femur
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ascenzi, Maria-Grazia, E-mail: mgascenzi@mednet.ucla.edu; Kawas, Neal P., E-mail: nealkawas@ucla.edu; Lutz, Andre, E-mail: andre.lutz@hotmail.de
2013-07-01
We present an innovative method to perform multi-scale finite element analyses of the cortical component of the femur using the individual’s (1) computed tomography scan; and (2) a bone specimen obtained in conjunction with orthopedic surgery. The method enables study of micro-structural characteristics regulating strains and stresses under physiological loading conditions. The analysis of the micro-structural scenarios that cause variation of strain and stress is the first step in understanding the elevated strains and stresses in bone tissue, which are indicative of higher likelihood of micro-crack formation in bone, implicated in consequent remodeling or macroscopic bone fracture. Evidence that micro-structuremore » varies with clinical history and contributes in significant, but poorly understood, ways to bone function, motivates the method’s development, as does need for software tools to investigate relationships between macroscopic loading and micro-structure. Three applications – varying region of interest, bone mineral density, and orientation of collagen type I, illustrate the method. We show, in comparison between physiological loading and simple compression of a patient’s femur, that strains computed at the multi-scale model’s micro-level: (i) differ; and (ii) depend on local collagen-apatite orientation and degree of calcification. Our findings confirm the strain concentration role of osteocyte lacunae, important for mechano-transduction. We hypothesize occurrence of micro-crack formation, leading either to remodeling or macroscopic fracture, when the computed strains exceed the elastic range observed in micro-structural testing.« less
Multi-scale Finite Element Modeling of Eustachian Tube Function: Influence of Mucosal Adhesion
Malik, J.E.; Swarts, J.D.; Ghadiali, S. N.
2017-01-01
The inability to open the collapsible Eustachian tube (ET) leads to the development of chronic Otitis Media (OM). Although mucosal inflammation during OM leads to increased mucin gene expression and elevated adhesion forces within the ET lumen, it is not known how changes in mucosal adhesion alter the biomechanical mechanisms of ET function. In this study, we developed a novel multi-scale finite element model of ET function in adults that utilizes adhesion spring elements to simulate changes in mucosal adhesion. Models were created for six adult subjects and dynamic patterns in muscle contraction were used to simulate the wave-like opening of the ET that occurs during swallowing. Results indicate that ET opening is highly sensitive to the level of mucosal adhesion and that exceeding a critical value of adhesion leads to rapid ET dysfunction. Parameter variation studies and sensitivity analysis indicate that increased mucosal adhesion alters the relative importance of several tissue biomechanical properties. For example, increases in mucosal adhesion reduced the sensitivity of ET function to tensor veli palatini muscle forces but did not alter the insensitivity of ET function to levator veli palatini muscle forces. Interestingly, although changes in cartilage stiffness did not significantly influence ET opening under low adhesion conditions, ET opening was highly sensitive to changes in cartilage stiffness under high adhesion conditions. Therefore, our multi-scale computational models indicate that changes in mucosal adhesion as would occur during inflammatory OM alter the biomechanical mechanisms of ET function. PMID:26891171
Solution of a tridiagonal system of equations on the finite element machine
NASA Technical Reports Server (NTRS)
Bostic, S. W.
1984-01-01
Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
Higher-order finite-difference formulation of periodic Orbital-free Density Functional Theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghosh, Swarnava; Suryanarayana, Phanish, E-mail: phanish.suryanarayana@ce.gatech.edu
2016-02-15
We present a real-space formulation and higher-order finite-difference implementation of periodic Orbital-free Density Functional Theory (OF-DFT). Specifically, utilizing a local reformulation of the electrostatic and kernel terms, we develop a generalized framework for performing OF-DFT simulations with different variants of the electronic kinetic energy. In particular, we propose a self-consistent field (SCF) type fixed-point method for calculations involving linear-response kinetic energy functionals. In this framework, evaluation of both the electronic ground-state and forces on the nuclei are amenable to computations that scale linearly with the number of atoms. We develop a parallel implementation of this formulation using the finite-difference discretization.more » We demonstrate that higher-order finite-differences can achieve relatively large convergence rates with respect to mesh-size in both the energies and forces. Additionally, we establish that the fixed-point iteration converges rapidly, and that it can be further accelerated using extrapolation techniques like Anderson's mixing. We validate the accuracy of the results by comparing the energies and forces with plane-wave methods for selected examples, including the vacancy formation energy in Aluminum. Overall, the suitability of the proposed formulation for scalable high performance computing makes it an attractive choice for large-scale OF-DFT calculations consisting of thousands of atoms.« less
FOLDER: A numerical tool to simulate the development of structures in layered media
NASA Astrophysics Data System (ADS)
Adamuszek, Marta; Dabrowski, Marcin; Schmid, Daniel W.
2015-04-01
FOLDER is a numerical toolbox for modelling deformation in layered media during layer parallel shortening or extension in two dimensions. FOLDER builds on MILAMIN [1], a finite element method based mechanical solver, with a range of utilities included from the MUTILS package [2]. Numerical mesh is generated using the Triangle software [3]. The toolbox includes features that allow for: 1) designing complex structures such as multi-layer stacks, 2) accurately simulating large-strain deformation of linear and non-linear viscous materials, 3) post-processing of various physical fields such as velocity (total and perturbing), rate of deformation, finite strain, stress, deviatoric stress, pressure, apparent viscosity. FOLDER is designed to ensure maximum flexibility to configure model geometry, define material parameters, specify range of numerical parameters in simulations and choose the plotting options. FOLDER is an open source MATLAB application and comes with a user friendly graphical interface. The toolbox additionally comprises an educational application that illustrates various analytical solutions of growth rates calculated for the cases of folding and necking of a single layer with interfaces perturbed with a single sinusoidal waveform. We further derive two novel analytical expressions for the growth rate in the cases of folding and necking of a linear viscous layer embedded in a linear viscous medium of a finite thickness. We use FOLDER to test the accuracy of single-layer folding simulations using various 1) spatial and temporal resolutions, 2) time integration schemes, and 3) iterative algorithms for non-linear materials. The accuracy of the numerical results is quantified by: 1) comparing them to analytical solution, if available, or 2) running convergence tests. As a result, we provide a map of the most optimal choice of grid size, time step, and number of iterations to keep the results of the numerical simulations below a given error for a given time integration scheme. We also demonstrate that Euler and Leapfrog time integration schemes are not recommended for any practical use. Finally, the capabilities of the toolbox are illustrated based on two examples: 1) shortening of a synthetic multi-layer sequence and 2) extension of a folded quartz vein embedded in phyllite from Sprague Upper Reservoir (example discussed by Sherwin and Chapple [4]). The latter example demonstrates that FOLDER can be successfully used for reverse modelling and mechanical restoration. [1] Dabrowski, M., Krotkiewski, M., and Schmid, D. W., 2008, MILAMIN: MATLAB-based finite element method solver for large problems. Geochemistry Geophysics Geosystems, vol. 9. [2] Krotkiewski, M. and Dabrowski M., 2010 Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parallel Computing, 36(4):181-198 [3] Shewchuk, J. R., 1996, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, In: Applied Computational Geometry: Towards Geometric Engineering'' (Ming C. Lin and Dinesh Manocha, editors), Vol. 1148 of Lecture Notes in Computer Science, pp. 203-222, Springer-Verlag, Berlin [4] Sherwin, J.A., Chapple, W.M., 1968. Wavelengths of single layer folds - a Comparison between theory and Observation. American Journal of Science 266 (3), p. 167-179
Parallel eigenanalysis of finite element models in a completely connected architecture
NASA Technical Reports Server (NTRS)
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas
2015-01-01
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
NASA Astrophysics Data System (ADS)
Önal, Orkun; Ozmenci, Cemre; Canadinc, Demircan
2014-09-01
A multi-scale modeling approach was applied to predict the impact response of a strain rate sensitive high-manganese austenitic steel. The roles of texture, geometry and strain rate sensitivity were successfully taken into account all at once by coupling crystal plasticity and finite element (FE) analysis. Specifically, crystal plasticity was utilized to obtain the multi-axial flow rule at different strain rates based on the experimental deformation response under uniaxial tensile loading. The equivalent stress - equivalent strain response was then incorporated into the FE model for the sake of a more representative hardening rule under impact loading. The current results demonstrate that reliable predictions can be obtained by proper coupling of crystal plasticity and FE analysis even if the experimental flow rule of the material is acquired under uniaxial loading and at moderate strain rates that are significantly slower than those attained during impact loading. Furthermore, the current findings also demonstrate the need for an experiment-based multi-scale modeling approach for the sake of reliable predictions of the impact response.
Casimir force in O(n) systems with a diffuse interface.
Dantchev, Daniel; Grüneberg, Daniel
2009-04-01
We study the behavior of the Casimir force in O(n) systems with a diffuse interface and slab geometry infinity;{d-1}xL , where 2
NASA Astrophysics Data System (ADS)
Balaji, V.; Benson, Rusty; Wyman, Bruce; Held, Isaac
2016-10-01
Climate models represent a large variety of processes on a variety of timescales and space scales, a canonical example of multi-physics multi-scale modeling. Current hardware trends, such as Graphical Processing Units (GPUs) and Many Integrated Core (MIC) chips, are based on, at best, marginal increases in clock speed, coupled with vast increases in concurrency, particularly at the fine grain. Multi-physics codes face particular challenges in achieving fine-grained concurrency, as different physics and dynamics components have different computational profiles, and universal solutions are hard to come by. We propose here one approach for multi-physics codes. These codes are typically structured as components interacting via software frameworks. The component structure of a typical Earth system model consists of a hierarchical and recursive tree of components, each representing a different climate process or dynamical system. This recursive structure generally encompasses a modest level of concurrency at the highest level (e.g., atmosphere and ocean on different processor sets) with serial organization underneath. We propose to extend concurrency much further by running more and more lower- and higher-level components in parallel with each other. Each component can further be parallelized on the fine grain, potentially offering a major increase in the scalability of Earth system models. We present here first results from this approach, called coarse-grained component concurrency, or CCC. Within the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (FMS), the atmospheric radiative transfer component has been configured to run in parallel with a composite component consisting of every other atmospheric component, including the atmospheric dynamics and all other atmospheric physics components. We will explore the algorithmic challenges involved in such an approach, and present results from such simulations. Plans to achieve even greater levels of coarse-grained concurrency by extending this approach within other components, such as the ocean, will be discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla
2014-03-01
Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures
NASA Astrophysics Data System (ADS)
Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna
2010-09-01
Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.
Element-topology-independent preconditioners for parallel finite element computations
NASA Technical Reports Server (NTRS)
Park, K. C.; Alexander, Scott
1992-01-01
A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Data-driven train set crash dynamics simulation
NASA Astrophysics Data System (ADS)
Tang, Zhao; Zhu, Yunrui; Nie, Yinyu; Guo, Shihui; Liu, Fengjia; Chang, Jian; Zhang, Jianjun
2017-02-01
Traditional finite element (FE) methods are arguably expensive in computation/simulation of the train crash. High computational cost limits their direct applications in investigating dynamic behaviours of an entire train set for crashworthiness design and structural optimisation. On the contrary, multi-body modelling is widely used because of its low computational cost with the trade-off in accuracy. In this study, a data-driven train crash modelling method is proposed to improve the performance of a multi-body dynamics simulation of train set crash without increasing the computational burden. This is achieved by the parallel random forest algorithm, which is a machine learning approach that extracts useful patterns of force-displacement curves and predicts a force-displacement relation in a given collision condition from a collection of offline FE simulation data on various collision conditions, namely different crash velocities in our analysis. Using the FE simulation results as a benchmark, we compared our method with traditional multi-body modelling methods and the result shows that our data-driven method improves the accuracy over traditional multi-body models in train crash simulation and runs at the same level of efficiency.
NASA Astrophysics Data System (ADS)
You, Jeong-Ha
2005-01-01
Fibrous metal matrix composites possess advanced mechanical properties compared to conventional alloys. It is expected that the application of these composites to a divertor component will enhance the structural reliability. A possible design concept would be a system consisting of tungsten armour, copper composite interlayer and copper heat sink where the composite interlayer is locally inserted into the highly stressed domain near the bond interface. For assessment of the design feasibility of the composite divertor concept, a non-linear multi-scale finite element analysis was performed. To this end, a micro-mechanics algorithm was implemented into a finite element code. A reactor-relevant heat flux load was assumed. Focus was placed on the evolution of stress state, plastic deformation and ductile damage on both macro- and microscopic scales. The structural response of the component and the micro-scale stress evolution of the composite laminate were investigated.
Tan, J L Y; Deshpande, V S; Fleck, N A
2016-07-13
A damage-based finite-element model is used to predict the fracture behaviour of centre-notched quasi-isotropic carbon-fibre-reinforced-polymer laminates under multi-axial loading. Damage within each ply is associated with fibre tension, fibre compression, matrix tension and matrix compression. Inter-ply delamination is modelled by cohesive interfaces using a traction-separation law. Failure envelopes for a notch and a circular hole are predicted for in-plane multi-axial loading and are in good agreement with the observed failure envelopes from a parallel experimental study. The ply-by-ply (and inter-ply) damage evolution and the critical mechanisms of ultimate failure also agree with the observed damage evolution. It is demonstrated that accurate predictions of notched compressive strength are obtained upon employing the band broadening stress for microbuckling, highlighting the importance of this damage mode in compression. This article is part of the themed issue 'Multiscale modelling of the structural integrity of composite materials'. © 2016 The Author(s).
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.
2012-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-located arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in lat/lon bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Hua, H.
2012-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
NASA Astrophysics Data System (ADS)
Shcherbakov, Alexandre S.; Chavez Dagostino, Miguel; Arellanes, Adan O.; Aguirre Lopez, Arturo
2016-09-01
We develop a multi-band spectrometer with a few spatially parallel optical arms for the combined processing of their data flow. Such multi-band capability has various applications in astrophysical scenarios at different scales: from objects in the distant universe to planetary atmospheres in the Solar system. Each optical arm exhibits original performances to provide parallel multi-band observations with different scales simultaneously. Similar possibility is based on designing each optical arm individually via exploiting different materials for acousto-optical cells operating within various regimes, frequency ranges and light wavelengths from independent light sources. Individual beam shapers provide both the needed incident light polarization and the required apodization to increase the dynamic range of a system. After parallel acousto-optical processing, data flows are united by the joint CCD matrix on the stage of the combined electronic data processing. At the moment, the prototype combines still three bands, i.e. includes three spatial optical arms. The first low-frequency arm operates at the central frequencies 60-80 MHz with frequency bandwidth 40 MHz. The second arm is oriented to middle-frequencies 350-500 MHz with frequency bandwidth 200-300 MHz. The third arm is intended for ultra-high-frequency radio-wave signals about 1.0-1.5 GHz with frequency bandwidth <300 MHz. To-day, this spectrometer has the following preliminary performances. The first arm exhibits frequency resolution 20 KHz; while the second and third arms give the resolution 150-200 KHz. The numbers of resolvable spots are 1500- 2000 depending on the regime of operation. The fourth optical arm at the frequency range 3.5 GHz is currently under construction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Sala, Marzio
In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Newmark local time stepping on high-performance computing architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch; Institute of Geophysics, ETH Zurich; Grote, Marcus, E-mail: marcus.grote@unibas.ch
In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strongmore » element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.« less
A cloud-based framework for large-scale traditional Chinese medical record retrieval.
Liu, Lijun; Liu, Li; Fu, Xiaodong; Huang, Qingsong; Zhang, Xianwen; Zhang, Yin
2018-01-01
Electronic medical records are increasingly common in medical practice. The secondary use of medical records has become increasingly important. It relies on the ability to retrieve the complete information about desired patient populations. How to effectively and accurately retrieve relevant medical records from large- scale medical big data is becoming a big challenge. Therefore, we propose an efficient and robust framework based on cloud for large-scale Traditional Chinese Medical Records (TCMRs) retrieval. We propose a parallel index building method and build a distributed search cluster, the former is used to improve the performance of index building, and the latter is used to provide high concurrent online TCMRs retrieval. Then, a real-time multi-indexing model is proposed to ensure the latest relevant TCMRs are indexed and retrieved in real-time, and a semantics-based query expansion method and a multi- factor ranking model are proposed to improve retrieval quality. Third, we implement a template-based visualization method for displaying medical reports. The proposed parallel indexing method and distributed search cluster can improve the performance of index building and provide high concurrent online TCMRs retrieval. The multi-indexing model can ensure the latest relevant TCMRs are indexed and retrieved in real-time. The semantics expansion method and the multi-factor ranking model can enhance retrieval quality. The template-based visualization method can enhance the availability and universality, where the medical reports are displayed via friendly web interface. In conclusion, compared with the current medical record retrieval systems, our system provides some advantages that are useful in improving the secondary use of large-scale traditional Chinese medical records in cloud environment. The proposed system is more easily integrated with existing clinical systems and be used in various scenarios. Copyright © 2017. Published by Elsevier Inc.
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Simulating neural systems with Xyce.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schiek, Richard Louis; Thornquist, Heidi K.; Mei, Ting
2012-12-01
Sandias parallel circuit simulator, Xyce, can address large scale neuron simulations in a new way extending the range within which one can perform high-fidelity, multi-compartment neuron simulations. This report documents the implementation of neuron devices in Xyce, their use in simulation and analysis of neuron systems.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
NASA Astrophysics Data System (ADS)
Wu, J.; Yang, Y.; Luo, Q.; Wu, J.
2012-12-01
This study presents a new hybrid multi-objective evolutionary algorithm, the niched Pareto tabu search combined with a genetic algorithm (NPTSGA), whereby the global search ability of niched Pareto tabu search (NPTS) is improved by the diversification of candidate solutions arose from the evolving nondominated sorting genetic algorithm II (NSGA-II) population. Also, the NPTSGA coupled with the commonly used groundwater flow and transport codes, MODFLOW and MT3DMS, is developed for multi-objective optimal design of groundwater remediation systems. The proposed methodology is then applied to a large-scale field groundwater remediation system for cleanup of large trichloroethylene (TCE) plume at the Massachusetts Military Reservation (MMR) in Cape Cod, Massachusetts. Furthermore, a master-slave (MS) parallelization scheme based on the Message Passing Interface (MPI) is incorporated into the NPTSGA to implement objective function evaluations in distributed processor environment, which can greatly improve the efficiency of the NPTSGA in finding Pareto-optimal solutions to the real-world application. This study shows that the MS parallel NPTSGA in comparison with the original NPTS and NSGA-II can balance the tradeoff between diversity and optimality of solutions during the search process and is an efficient and effective tool for optimizing the multi-objective design of groundwater remediation systems under complicated hydrogeologic conditions.
NASA Astrophysics Data System (ADS)
Grujicic, M.; Galgalikar, R.; Snipes, J. S.; Ramaswami, S.
2016-01-01
In our recent work, a multi-length-scale room-temperature material model for SiC/SiC ceramic-matrix composites (CMCs) was derived and parameterized. The model was subsequently linked with a finite-element solver so that it could be used in a general room-temperature, structural/damage analysis of gas-turbine engine CMC components. Due to its multi-length-scale character, the material model enabled inclusion of the effects of fiber/tow (e.g., the volume fraction, size, and properties of the fibers; fiber-coating material/thickness; decohesion properties of the coating/matrix interfaces; etc.) and ply/lamina (e.g., the 0°/90° cross-ply versus plain-weave architectures, the extent of tow crimping in the case of the plain-weave plies, cohesive properties of the inter-ply boundaries, etc.) length-scale microstructural/architectural parameters on the mechanical response of the CMCs. One of the major limitations of the model is that it applies to the CMCs in their as-fabricated conditions (i.e., the effect of prolonged in-service environmental exposure and the associated material aging-degradation is not accounted for). In the present work, the model is upgraded to include such in-service environmental-exposure effects. To demonstrate the utility of the upgraded material model, it is used within a finite-element structural/failure analysis involving impact of a toboggan-shaped turbine shroud segment by a foreign object. The results obtained clearly revealed the effects that different aspects of the in-service environmental exposure have on the material degradation and the extent of damage suffered by the impacted CMC toboggan-shaped shroud segment.
An object-oriented, coprocessor-accelerated model for ice sheet simulations
NASA Astrophysics Data System (ADS)
Seddik, H.; Greve, R.
2013-12-01
Recently, numerous models capable of modeling the thermo-dynamics of ice sheets have been developed within the ice sheet modeling community. Their capabilities have been characterized by a wide range of features with different numerical methods (finite difference or finite element), different implementations of the ice flow mechanics (shallow-ice, higher-order, full Stokes) and different treatments for the basal and coastal areas (basal hydrology, basal sliding, ice shelves). Shallow-ice models (SICOPOLIS, IcIES, PISM, etc) have been widely used for modeling whole ice sheets (Greenland and Antarctica) due to the relatively low computational cost of the shallow-ice approximation but higher order (ISSM, AIF) and full Stokes (Elmer/Ice) models have been recently used to model the Greenland ice sheet. The advance in processor speed and the decrease in cost for accessing large amount of memory and storage have undoubtedly been the driving force in the commoditization of models with higher capabilities, and the popularity of Elmer/Ice (http://elmerice.elmerfem.com) with an active user base is a notable representation of this trend. Elmer/Ice is a full Stokes model built on top of the multi-physics package Elmer (http://www.csc.fi/english/pages/elmer) which provides the full machinery for the complex finite element procedure and is fully parallel (mesh partitioning with OpenMPI communication). Elmer is mainly written in Fortran 90 and targets essentially traditional processors as the code base was not initially written to run on modern coprocessors (yet adding support for the recently introduced x86 based coprocessors is possible). Furthermore, a truly modular and object-oriented implementation is required for quick adaptation to fast evolving capabilities in hardware (Fortran 2003 provides an object-oriented programming model while not being clean and requiring a tricky refactoring of Elmer code). In this work, the object-oriented, coprocessor-accelerated finite element code Sainou is introduced. Sainou is an Elmer fork which is reimplemented in Objective C and used for experimenting with ice sheet models running on coprocessors, essentially GPU devices. GPUs are highly parallel processors that provide opportunities for fine-grained parallelization of the full Stokes problem using the standard OpenCL language (http://www.khronos.org/opencl/) to access the device. Sainou is built upon a collection of Objective C base classes that service a modular kernel (itself a base class) which provides the core methods to solve the finite element problem. An early implementation of Sainou will be presented with emphasis on the object architecture and the strategies of parallelizations. The computation of a simple heat conduction problem is used to test the implementation which also provides experimental support for running the global matrix assembly on GPU.
2012-08-25
Accel- erated Crystal Plasticity FEM Simulations (submitted). 5. M. Anahid, M. Samal and S. Ghosh, Dwell fatigue crack nucleation model based on using...4] M. Anahid, M. K. Samal , and S. Ghosh. Dwell fatigue crack nucleation model based on crystal plasticity finite element simulations of
Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations
1999-01-01
purpose of the second partitioning phase , on each SMP, is to minimize the communication within the SMP; even if a multi - threaded matrix vector product...8.7 Comparison of model with experimental data for send phase of matrix vector product on ne grid...140 8.4 Matrix vector product phase times : : : : : : : : : : : : : : : : : : : : : : : 145 9.1 Flat and
Self-Organized Criticality and Scaling in Lifetime of Traffic Jams
NASA Astrophysics Data System (ADS)
Nagatani, Takashi
1995-01-01
The deterministic cellular automaton 184 (the one-dimensional asymmetric simple-exclusion model with parallel dynamics) is extended to take into account injection or extraction of particles. The model presents the traffic flow on a highway with inflow or outflow of cars.Introducing injection or extraction of particles into the asymmetric simple-exclusion model drives the system asymptotically into a steady state exhibiting a self-organized criticality. The typical lifetime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
2016-07-26
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Performance of low-rank QR approximation of the finite element Biot-Savart law
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, D; Fasenfest, B
2006-10-16
In this paper we present a low-rank QR method for evaluating the discrete Biot-Savart law. Our goal is to develop an algorithm that is easily implemented on parallel computers. It is assumed that the known current density and the unknown magnetic field are both expressed in a finite element expansion, and we wish to compute the degrees-of-freedom (DOF) in the basis function expansion of the magnetic field. The matrix that maps the current DOF to the field DOF is full, but if the spatial domain is properly partitioned the matrix can be written as a block matrix, with blocks representingmore » distant interactions being low rank and having a compressed QR representation. While an octree partitioning of the matrix may be ideal, for ease of parallel implementation we employ a partitioning based on number of processors. The rank of each block (i.e. the compression) is determined by the specific geometry and is computed dynamically. In this paper we provide the algorithmic details and present computational results for large-scale computations.« less
The transmission of finite amplitude sound beam in multi-layered biological media
NASA Astrophysics Data System (ADS)
Liu, Xiaozhou; Li, Junlun; Yin, Chang; Gong, Xiufen; Zhang, Dong; Xue, Honghui
2007-02-01
Based on the Khokhlov Zabolotskaya Kuznetsov (KZK) equation, a model in the frequency domain is given to describe the transmission of finite amplitude sound beam in multi-layered biological media. Favorable agreement between the theoretical analyses and the measured results shows this approach could effectively describe the transmission of finite amplitude sound wave in multi-layered biological media.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.
Khan, Md Ashfaquzzaman; Herbordt, Martin C
2011-07-20
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Assignment Of Finite Elements To Parallel Processors
NASA Technical Reports Server (NTRS)
Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.
1990-01-01
Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-07-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Nemesis I: Parallel Enhancements to ExodusII
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hennigan, Gary L.; John, Matthew S.; Shadid, John N.
2006-03-28
NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.
NASA Astrophysics Data System (ADS)
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Small-scale multi-axial hybrid simulation of a shear-critical reinforced concrete frame
NASA Astrophysics Data System (ADS)
Sadeghian, Vahid; Kwon, Oh-Sung; Vecchio, Frank
2017-10-01
This study presents a numerical multi-scale simulation framework which is extended to accommodate hybrid simulation (numerical-experimental integration). The framework is enhanced with a standardized data exchange format and connected to a generalized controller interface program which facilitates communication with various types of laboratory equipment and testing configurations. A small-scale experimental program was conducted using a six degree-of-freedom hydraulic testing equipment to verify the proposed framework and provide additional data for small-scale testing of shearcritical reinforced concrete structures. The specimens were tested in a multi-axial hybrid simulation manner under a reversed cyclic loading condition simulating earthquake forces. The physical models were 1/3.23-scale representations of a beam and two columns. A mixed-type modelling technique was employed to analyze the remainder of the structures. The hybrid simulation results were compared against those obtained from a large-scale test and finite element analyses. The study found that if precautions are taken in preparing model materials and if the shear-related mechanisms are accurately considered in the numerical model, small-scale hybrid simulations can adequately simulate the behaviour of shear-critical structures. Although the findings of the study are promising, to draw general conclusions additional test data are required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grierson, B. A.; Staebler, G. M.; Solomon, W. M.
Multi-scale fluctuations measured by turbulence diagnostics spanning long and short wavelength spatial scales impact energy confinement and the scale-lengths of plasma kinetic profiles in the DIII-D ITER baseline scenario with direct electron heating. Contrasting discharge phases with ECH + neutral beam injection (NBI) and NBI only at similar rotation reveal higher energy confinement and lower fluctuations when only NBI heating is used. Modeling of the core transport with TGYRO using the TGLF turbulent transport model and NEO neoclassical transport reproduces the experimental profile changes upon application of direct electron heating and indicates that multi-scale transport mechanisms are responsible for changesmore » in the temperature and density profiles. Intermediate and high-k fluctuations appear responsible for the enhanced electron thermal flux, and intermediate-k electron modes produce an inward particle pinch that increases the inverse density scale length. Projection to ITER is performed with TGLF and indicates a density profile that has a finite scale length due to intermediate-k electron modes at low collisionality and increases the fusion gain. Finally, for a range of E×B shear, the dominant mechanism that increases fusion performance is suppression of outward low-k particle flux and increased density peaking.« less
Grierson, B. A.; Staebler, G. M.; Solomon, W. M.; ...
2018-02-01
Multi-scale fluctuations measured by turbulence diagnostics spanning long and short wavelength spatial scales impact energy confinement and the scale-lengths of plasma kinetic profiles in the DIII-D ITER baseline scenario with direct electron heating. Contrasting discharge phases with ECH + neutral beam injection (NBI) and NBI only at similar rotation reveal higher energy confinement and lower fluctuations when only NBI heating is used. Modeling of the core transport with TGYRO using the TGLF turbulent transport model and NEO neoclassical transport reproduces the experimental profile changes upon application of direct electron heating and indicates that multi-scale transport mechanisms are responsible for changesmore » in the temperature and density profiles. Intermediate and high-k fluctuations appear responsible for the enhanced electron thermal flux, and intermediate-k electron modes produce an inward particle pinch that increases the inverse density scale length. Projection to ITER is performed with TGLF and indicates a density profile that has a finite scale length due to intermediate-k electron modes at low collisionality and increases the fusion gain. Finally, for a range of E×B shear, the dominant mechanism that increases fusion performance is suppression of outward low-k particle flux and increased density peaking.« less
NASA Astrophysics Data System (ADS)
Grierson, B. A.; Staebler, G. M.; Solomon, W. M.; McKee, G. R.; Holland, C.; Austin, M.; Marinoni, A.; Schmitz, L.; Pinsker, R. I.; DIII-D Team
2018-02-01
Multi-scale fluctuations measured by turbulence diagnostics spanning long and short wavelength spatial scales impact energy confinement and the scale-lengths of plasma kinetic profiles in the DIII-D ITER baseline scenario with direct electron heating. Contrasting discharge phases with ECH + neutral beam injection (NBI) and NBI only at similar rotation reveal higher energy confinement and lower fluctuations when only NBI heating is used. Modeling of the core transport with TGYRO using the TGLF turbulent transport model and NEO neoclassical transport reproduces the experimental profile changes upon application of direct electron heating and indicates that multi-scale transport mechanisms are responsible for changes in the temperature and density profiles. Intermediate and high-k fluctuations appear responsible for the enhanced electron thermal flux, and intermediate-k electron modes produce an inward particle pinch that increases the inverse density scale length. Projection to ITER is performed with TGLF and indicates a density profile that has a finite scale length due to intermediate-k electron modes at low collisionality and increases the fusion gain. For a range of E × B shear, the dominant mechanism that increases fusion performance is suppression of outward low-k particle flux and increased density peaking.
Multi-scale finite element modeling of Eustachian tube function: influence of mucosal adhesion.
Malik, J E; Swarts, J D; Ghadiali, S N
2016-12-01
The inability to open the collapsible Eustachian tube (ET) leads to the development of chronic Otitis Media (OM). Although mucosal inflammation during OM leads to increased mucin gene expression and elevated adhesion forces within the ET lumen, it is not known how changes in mucosal adhesion alter the biomechanical mechanisms of ET function. In this study, we developed a novel multi-scale finite element model of ET function in adults that utilizes adhesion spring elements to simulate changes in mucosal adhesion. Models were created for six adult subjects, and dynamic patterns in muscle contraction were used to simulate the wave-like opening of the ET that occurs during swallowing. Results indicate that ET opening is highly sensitive to the level of mucosal adhesion and that exceeding a critical value of adhesion leads to rapid ET dysfunction. Parameter variation studies and sensitivity analysis indicate that increased mucosal adhesion alters the relative importance of several tissue biomechanical properties. For example, increases in mucosal adhesion reduced the sensitivity of ET function to tensor veli palatini muscle forces but did not alter the insensitivity of ET function to levator veli palatini muscle forces. Interestingly, although changes in cartilage stiffness did not significantly influence ET opening under low adhesion conditions, ET opening was highly sensitive to changes in cartilage stiffness under high adhesion conditions. Therefore, our multi-scale computational models indicate that changes in mucosal adhesion as would occur during inflammatory OM alter the biomechanical mechanisms of ET function. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
From particle condensation to polymer aggregation
NASA Astrophysics Data System (ADS)
Janke, Wolfhard; Zierenberg, Johannes
2018-01-01
We draw an analogy between droplet formation in dilute particle and polymer systems. Our arguments are based on finite-size scaling results from studies of a two-dimensional lattice gas to three-dimensional bead-spring polymers. To set the results in perspective, we compare with in part rigorous theoretical scaling laws for canonical condensation in a supersaturated gas at fixed temperature, and derive corresponding scaling predictions for an undercooled gas at fixed density. The latter allows one to efficiently employ parallel multicanonical simulations and to reach previously not accessible scaling regimes. While the asymptotic scaling can not be observed for the comparably small polymer system sizes, they demonstrate an intermediate scaling regime also observable for particle condensation. Altogether, our extensive results from computer simulations provide clear evidence for the close analogy between particle condensation and polymer aggregation in dilute systems.
Li, Ying-Jun; Yang, Cong; Wang, Gui-Cong; Zhang, Hui; Cui, Huan-Yong; Zhang, Yong-Liang
2017-09-01
This paper presents a novel integrated piezoelectric six-dimensional force sensor which can realize dynamic measurement of multi-dimensional space load. Firstly, the composition of the sensor, the spatial layout of force-sensitive components, and measurement principle are analyzed and designed. There is no interference of piezoelectric six-dimensional force sensor in theoretical analysis. Based on the principle of actual work and deformation compatibility coherence, this paper deduces the parallel load sharing principle of the piezoelectric six-dimensional force sensor. The main effect factors which affect the load sharing ratio are obtained. The finite element model of the piezoelectric six-dimensional force sensor is established. In order to verify the load sharing principle of the sensor, a load sharing test device of piezoelectric force sensor is designed and fabricated. The load sharing experimental platform is set up. The experimental results are in accordance with the theoretical analysis and simulation results. The experiments show that the multi-dimensional and heavy force measurement can be realized by the parallel arrangement of the load sharing ring and the force sensitive element in the novel integrated piezoelectric six-dimensional force sensor. The ideal load sharing effect of the sensor can be achieved by appropriate size parameters. This paper has an important guide for the design of the force measuring device according to the load sharing mode. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Phase-space finite elements in a least-squares solution of the transport equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drumm, C.; Fan, W.; Pautz, S.
2013-07-01
The linear Boltzmann transport equation is solved using a least-squares finite element approximation in the space, angular and energy phase-space variables. The method is applied to both neutral particle transport and also to charged particle transport in the presence of an electric field, where the angular and energy derivative terms are handled with the energy/angular finite elements approximation, in a manner analogous to the way the spatial streaming term is handled. For multi-dimensional problems, a novel approach is used for the angular finite elements: mapping the surface of a unit sphere to a two-dimensional planar region and using a meshingmore » tool to generate a mesh. In this manner, much of the spatial finite-elements machinery can be easily adapted to handle the angular variable. The energy variable and the angular variable for one-dimensional problems make use of edge/beam elements, also building upon the spatial finite elements capabilities. The methods described here can make use of either continuous or discontinuous finite elements in space, angle and/or energy, with the use of continuous finite elements resulting in a smaller problem size and the use of discontinuous finite elements resulting in more accurate solutions for certain types of problems. The work described in this paper makes use of continuous finite elements, so that the resulting linear system is symmetric positive definite and can be solved with a highly efficient parallel preconditioned conjugate gradients algorithm. The phase-space finite elements capability has been built into the Sceptre code and applied to several test problems, including a simple one-dimensional problem with an analytic solution available, a two-dimensional problem with an isolated source term, showing how the method essentially eliminates ray effects encountered with discrete ordinates, and a simple one-dimensional charged-particle transport problem in the presence of an electric field. (authors)« less
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, F.; Morel, M.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.
Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang
2017-03-14
The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).
NASA Astrophysics Data System (ADS)
Zheng, Yan
2015-03-01
Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
Development of an Efficient Meso- scale Multi-phase Flow Solver in Nuclear Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Taehun
2015-10-20
The proposed research aims at formulating a predictive high-order Lattice Boltzmann Equation for multi-phase flows relevant to nuclear energy related application - namely, saturated and sub-cooled boiling in reactors, and liquid- liquid mixing and extraction for fuel cycle separation. An efficient flow solver will be developed based on the Finite Element based Lattice Boltzmann Method (FE- LBM), accounting for phase-change heat transfer and capable of treating multiple phases over length scales from the submicron to the meter. A thermal LBM will be developed in order to handle adjustable Prandtl number, arbitrary specific heat ratio, a wide range of temperature variations,more » better numerical stability during liquid-vapor phase change, and full thermo-hydrodynamic consistency. Two-phase FE-LBM will be extended to liquid–liquid–gas multi-phase flows for application to high-fidelity simulations building up from the meso-scale up to the equipment sub-component scale. While several relevant applications exist, the initial applications for demonstration of the efficient methods to be developed as part of this project include numerical investigations of Critical Heat Flux (CHF) phenomena in nuclear reactor fuel bundles, and liquid-liquid mixing and interfacial area generation for liquid-liquid separations. In addition, targeted experiments will be conducted for validation of this advanced multi-phase model.« less
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prudencio, E.; Candel, A.; Ge, L.
2008-02-04
In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation timemore » from days to some hours.« less
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Parallel Semi-Implicit Spectral Element Atmospheric Model
NASA Astrophysics Data System (ADS)
Fournier, A.; Thomas, S.; Loft, R.
2001-05-01
The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
The LSST Data Mining Research Agenda
NASA Astrophysics Data System (ADS)
Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.
2008-12-01
We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Upadhyay, Manas V.; Patra, Anirban; Wen, Wei; ...
2018-05-08
In this paper, we propose a multi-scale modeling approach that can simulate the microstructural and mechanical behavior of metal or alloy parts with complex geometries subjected to multi-axial load path changes. The model is used to understand the biaxial load path change behavior of 316L stainless steel cruciform samples. At the macroscale, a finite element approach is used to simulate the cruciform geometry and numerically predict the gauge stresses, which are difficult to obtain analytically. At each material point in the finite element mesh, the anisotropic viscoplastic self-consistent model is used to simulate the role of texture evolution on themore » mechanical response. At the single crystal level, a dislocation density based hardening law that appropriately captures the role of multi-axial load path changes on slip activity is used. The combined approach is experimentally validated using cruciform samples subjected to uniaxial load and unload followed by different biaxial reloads in the angular range [27º, 90º]. Polycrystalline yield surfaces before and after load path changes are generated using the full-field elasto-viscoplastic fast Fourier transform model to study the influence of the deformation history and reloading direction on the mechanical response, including the Bauschinger effect, of these cruciform samples. Results reveal that the Bauschinger effect is strongly dependent on the first loading direction and strain, intergranular and macroscopic residual stresses after first load, and the reloading angle. The microstructural origins of the mechanical response are discussed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Upadhyay, Manas V.; Patra, Anirban; Wen, Wei
In this paper, we propose a multi-scale modeling approach that can simulate the microstructural and mechanical behavior of metal or alloy parts with complex geometries subjected to multi-axial load path changes. The model is used to understand the biaxial load path change behavior of 316L stainless steel cruciform samples. At the macroscale, a finite element approach is used to simulate the cruciform geometry and numerically predict the gauge stresses, which are difficult to obtain analytically. At each material point in the finite element mesh, the anisotropic viscoplastic self-consistent model is used to simulate the role of texture evolution on themore » mechanical response. At the single crystal level, a dislocation density based hardening law that appropriately captures the role of multi-axial load path changes on slip activity is used. The combined approach is experimentally validated using cruciform samples subjected to uniaxial load and unload followed by different biaxial reloads in the angular range [27º, 90º]. Polycrystalline yield surfaces before and after load path changes are generated using the full-field elasto-viscoplastic fast Fourier transform model to study the influence of the deformation history and reloading direction on the mechanical response, including the Bauschinger effect, of these cruciform samples. Results reveal that the Bauschinger effect is strongly dependent on the first loading direction and strain, intergranular and macroscopic residual stresses after first load, and the reloading angle. The microstructural origins of the mechanical response are discussed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Corato, M., E-mail: marco.decorato@unina.it; Slot, J.J.M., E-mail: j.j.m.slot@tue.nl; Hütter, M., E-mail: m.huetter@tue.nl
In this paper, we present a finite element implementation of fluctuating hydrodynamics with a moving boundary fitted mesh for treating the suspended particles. The thermal fluctuations are incorporated into the continuum equations using the Landau and Lifshitz approach [1]. The proposed implementation fulfills the fluctuation–dissipation theorem exactly at the discrete level. Since we restrict the equations to the creeping flow case, this takes the form of a relation between the diffusion coefficient matrix and friction matrix both at the particle and nodal level of the finite elements. Brownian motion of arbitrarily shaped particles in complex confinements can be considered withinmore » the present formulation. A multi-step time integration scheme is developed to correctly capture the drift term required in the stochastic differential equation (SDE) describing the evolution of the positions of the particles. The proposed approach is validated by simulating the Brownian motion of a sphere between two parallel plates and the motion of a spherical particle in a cylindrical cavity. The time integration algorithm and the fluctuating hydrodynamics implementation are then applied to study the diffusion and the equilibrium probability distribution of a confined circle under an external harmonic potential.« less
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop
NASA Astrophysics Data System (ADS)
Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.
2018-04-01
The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
NASA Astrophysics Data System (ADS)
Matsuoka, Seikichi; Idomura, Yasuhiro; Satake, Shinsuke
2017-10-01
The neoclassical toroidal viscosity (NTV) caused by a non-axisymmetric magnetic field perturbation is numerically studied using two global kinetic simulations with different numerical approaches. Both simulations reproduce similar collisionality ( νb*) dependencies over wide νb * ranges. It is demonstrated that resonant structures in the velocity space predicted by the conventional superbanana-plateau theory exist in the small banana width limit, while the resonances diminish when the banana width becomes large. It is also found that fine scale structures are generated in the velocity space as νb* decreases in the large banana width simulations, leading to the νb* -dependency of the NTV. From the analyses of the particle orbit, it is found that the finite k∥ mode structure along the bounce motion appears owing to the finite orbit width, and it suffers from bounce phase mixing, suggesting the generation of the fine scale structures by the similar mechanism as the parallel phase mixing of passing particles.
Magnetic Shear Damped Polar Convective Fluid Instabilities
NASA Astrophysics Data System (ADS)
Atul, Jyoti K.; Singh, Rameswar; Sarkar, Sanjib; Kravchenko, Oleg V.; Singh, Sushil K.; Chattopadhyaya, Prabal K.; Kaw, Predhiman K.
2018-01-01
The influence of the magnetic field shear is studied on the E × B (and/or gravitational) and the Current Convective Instabilities (CCI) occurring in the high-latitude F layer ionosphere. It is shown that magnetic shear reduces the growth rate of these instabilities. The magnetic shear-induced stabilization is more effective at the larger-scale sizes (≥ tens of kilometers) while at the scintillation causing intermediate scale sizes (˜ a few kilometers), the growth rate remains largely unaffected. The eigenmode structure gets localized about a rational surface due to finite magnetic shear and has broken reflectional symmetry due to centroid shift of the mode by equilibrium parallel flow or current.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm
NASA Astrophysics Data System (ADS)
Backes, Werner; Wetzel, Susanne
In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
Casimir force in the Gödel space-time and its possible induced cosmological inhomogeneity
NASA Astrophysics Data System (ADS)
Khodabakhshi, Sh.; Shojai, A.
2017-07-01
The Casimir force between two parallel plates in the Gödel universe is computed for a scalar field at finite temperature. It is observed that when the plates' separation is comparable with the scale given by the rotation of the space-time, the force becomes repulsive and then approaches zero. Since it has been shown previously that the universe may experience a Gödel phase for a small period of time, the induced inhomogeneities from the Casimir force are also studied.
Variable-Complexity Multidisciplinary Optimization on Parallel Computers
NASA Technical Reports Server (NTRS)
Grossman, Bernard; Mason, William H.; Watson, Layne T.; Haftka, Raphael T.
1998-01-01
This report covers work conducted under grant NAG1-1562 for the NASA High Performance Computing and Communications Program (HPCCP) from December 7, 1993, to December 31, 1997. The objective of the research was to develop new multidisciplinary design optimization (MDO) techniques which exploit parallel computing to reduce the computational burden of aircraft MDO. The design of the High-Speed Civil Transport (HSCT) air-craft was selected as a test case to demonstrate the utility of our MDO methods. The three major tasks of this research grant included: development of parallel multipoint approximation methods for the aerodynamic design of the HSCT, use of parallel multipoint approximation methods for structural optimization of the HSCT, mathematical and algorithmic development including support in the integration of parallel computation for items (1) and (2). These tasks have been accomplished with the development of a response surface methodology that incorporates multi-fidelity models. For the aerodynamic design we were able to optimize with up to 20 design variables using hundreds of expensive Euler analyses together with thousands of inexpensive linear theory simulations. We have thereby demonstrated the application of CFD to a large aerodynamic design problem. For the predicting structural weight we were able to combine hundreds of structural optimizations of refined finite element models with thousands of optimizations based on coarse models. Computations have been carried out on the Intel Paragon with up to 128 nodes. The parallel computation allowed us to perform combined aerodynamic-structural optimization using state of the art models of a complex aircraft configurations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadid, John Nicolas; Elman, Howard; Shuttleworth, Robert R.
2007-04-01
In recent years, considerable effort has been placed on developing efficient and robust solution algorithms for the incompressible Navier-Stokes equations based on preconditioned Krylov methods. These include physics-based methods, such as SIMPLE, and purely algebraic preconditioners based on the approximation of the Schur complement. All these techniques can be represented as approximate block factorization (ABF) type preconditioners. The goal is to decompose the application of the preconditioner into simplified sub-systems in which scalable multi-level type solvers can be applied. In this paper we develop a taxonomy of these ideas based on an adaptation of a generalized approximate factorization of themore » Navier-Stokes system first presented in [25]. This taxonomy illuminates the similarities and differences among these preconditioners and the central role played by efficient approximation of certain Schur complement operators. We then present a parallel computational study that examines the performance of these methods and compares them to an additive Schwarz domain decomposition (DD) algorithm. Results are presented for two and three-dimensional steady state problems for enclosed domains and inflow/outflow systems on both structured and unstructured meshes. The numerical experiments are performed using MPSalsa, a stabilized finite element code.« less
Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M.; Hilgart, Mark C.; Stepanov, Sergey; Sanishvili, Ruslan; Becker, Michael; Winter, Graeme; Sauter, Nicholas K.; Smith, Janet L.; Fischetti, Robert F.
2014-01-01
The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates a collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce. PMID:25484844
Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M.; ...
2014-11-18
The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates amore » collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce.« less
Finite Volume Numerical Methods for Aeroheating Rate Calculations from Infrared Thermographic Data
NASA Technical Reports Server (NTRS)
Daryabeigi, Kamran; Berry, Scott A.; Horvath, Thomas J.; Nowak, Robert J.
2006-01-01
The use of multi-dimensional finite volume heat conduction techniques for calculating aeroheating rates from measured global surface temperatures on hypersonic wind tunnel models was investigated. Both direct and inverse finite volume techniques were investigated and compared with the standard one-dimensional semi-infinite technique. Global transient surface temperatures were measured using an infrared thermographic technique on a 0.333-scale model of the Hyper-X forebody in the NASA Langley Research Center 20-Inch Mach 6 Air tunnel. In these tests the effectiveness of vortices generated via gas injection for initiating hypersonic transition on the Hyper-X forebody was investigated. An array of streamwise-orientated heating striations was generated and visualized downstream of the gas injection sites. In regions without significant spatial temperature gradients, one-dimensional techniques provided accurate aeroheating rates. In regions with sharp temperature gradients caused by striation patterns multi-dimensional heat transfer techniques were necessary to obtain more accurate heating rates. The use of the one-dimensional technique resulted in differences of 20% in the calculated heating rates compared to 2-D analysis because it did not account for lateral heat conduction in the model.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-01-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
Parallel aeroelastic computations for wing and wing-body configurations
NASA Technical Reports Server (NTRS)
Byun, Chansup
1994-01-01
The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
Multiscale functions, scale dynamics, and applications to partial differential equations
NASA Astrophysics Data System (ADS)
Cresson, Jacky; Pierret, Frédéric
2016-05-01
Modeling phenomena from experimental data always begins with a choice of hypothesis on the observed dynamics such as determinism, randomness, and differentiability. Depending on these choices, different behaviors can be observed. The natural question associated to the modeling problem is the following: "With a finite set of data concerning a phenomenon, can we recover its underlying nature? From this problem, we introduce in this paper the definition of multi-scale functions, scale calculus, and scale dynamics based on the time scale calculus [see Bohner, M. and Peterson, A., Dynamic Equations on Time Scales: An Introduction with Applications (Springer Science & Business Media, 2001)] which is used to introduce the notion of scale equations. These definitions will be illustrated on the multi-scale Okamoto's functions. Scale equations are analysed using scale regimes and the notion of asymptotic model for a scale equation under a particular scale regime. The introduced formalism explains why a single scale equation can produce distinct continuous models even if the equation is scale invariant. Typical examples of such equations are given by the scale Euler-Lagrange equation. We illustrate our results using the scale Newton's equation which gives rise to a non-linear diffusion equation or a non-linear Schrödinger equation as asymptotic continuous models depending on the particular fractional scale regime which is considered.
Continuous Mass Measurement on Conveyor Belt
NASA Astrophysics Data System (ADS)
Tomobe, Yuki; Tasaki, Ryosuke; Yamazaki, Takanori; Ohnishi, Hideo; Kobayashi, Masaaki; Kurosu, Shigeru
The continuous mass measurement of packages on a conveyor belt will become greatly important. In the mass measurement, the sequence of products is generally random. An interesting possibility of raising throughput of the conveyor line without increasing the conveyor belt speed is offered by the use of two or three conveyor belt scales (called a multi-stage conveyor belt scale). The multi-stage conveyor belt scale can be created which will adjust the conveyor belt length to the product length. The conveyor belt scale usually has maximum capacities of less than 80kg and 140cm, and achieves measuring rates of more than 150 packages per minute and more. The output signals from the conveyor belt scale are always contaminated with noises due to vibrations of the conveyor and the product to be measured in motion. In this paper an employed digital filter is of Finite Impulse Response (FIR) type designed under the consideration on the dynamics of the conveyor system. The experimental results on the conveyor belt scale suggest that the filtering algorithms are effective enough to practical applications to some extent.
Proceedings of the 14th International Conference on the Numerical Simulation of Plasmas
NASA Astrophysics Data System (ADS)
Partial Contents are as follows: Numerical Simulations of the Vlasov-Maxwell Equations by Coupled Particle-Finite Element Methods on Unstructured Meshes; Electromagnetic PIC Simulations Using Finite Elements on Unstructured Grids; Modelling Travelling Wave Output Structures with the Particle-in-Cell Code CONDOR; SST--A Single-Slice Particle Simulation Code; Graphical Display and Animation of Data Produced by Electromagnetic, Particle-in-Cell Codes; A Post-Processor for the PEST Code; Gray Scale Rendering of Beam Profile Data; A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers; 3-D Electromagnetic PIC Simulation on the NRL Connection Machine; Plasma PIC Simulations on MIMD Computers; Vlasov-Maxwell Algorithm for Electromagnetic Plasma Simulation on Distributed Architectures; MHD Boundary Layer Calculation Using the Vortex Method; and Eulerian Codes for Plasma Simulations.
NASA Astrophysics Data System (ADS)
Grasso, J. R.; Bachèlery, P.
Self-organized systems are often used to describe natural phenomena where power laws and scale invariant geometry are observed. The Piton de la Fournaise volcano shows power-law behavior in many aspects. These include the temporal distribution of eruptions, the frequency-size distributions of induced earthquakes, dikes, fissures, lava flows and interflow periods, all evidence of self-similarity over a finite scale range. We show that the bounds to scale-invariance can be used to derive geomechanical constraints on both the volcano structure and the volcano mechanics. We ascertain that the present magma bodies are multi-lens reservoirs in a quasi-eruptive condition, i.e. a marginally critical state. The scaling organization of dynamic fluid-induced observables on the volcano, such as fluid induced earthquakes, dikes and surface fissures, appears to be controlled by underlying static hierarchical structure (geology) similar to that proposed for fluid circulations in human physiology. The emergence of saturation lengths for the scalable volcanic observable argues for the finite scalability of complex naturally self-organized critical systems, including volcano dynamics.
Multi-Objective Parallel Test-Sheet Composition Using Enhanced Particle Swarm Optimization
ERIC Educational Resources Information Center
Ho, Tsu-Feng; Yin, Peng-Yeng; Hwang, Gwo-Jen; Shyu, Shyong Jian; Yean, Ya-Nan
2009-01-01
For large-scale tests, such as certification tests or entrance examinations, the composed test sheets must meet multiple assessment criteria. Furthermore, to fairly compare the knowledge levels of the persons who receive tests at different times owing to the insufficiency of available examination halls or the occurrence of certain unexpected…
Multi-jagged: A scalable parallel spatial partitioning algorithm
Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...
2015-03-18
Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
Computational Modeling System for Deformation and Failure in Polycrystalline Metals
2009-03-29
FIB/EHSD 3.3 The Voronoi Cell FEM for Micromechanical Modeling 3.4 VCFEM for Microstructural Damage Modeling 3.5 Adaptive Multiscale Simulations...accurate and efficient image-based micromechanical finite element model, for crystal plasticity and damage , incorporating real morphological and...topology with evolving strain localization and damage . (v) Development of multi-scaling algorithms in the time domain for compression and localization in
NASA Astrophysics Data System (ADS)
Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.
2011-12-01
With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Parallelized implicit propagators for the finite-difference Schrödinger equation
NASA Astrophysics Data System (ADS)
Parker, Jonathan; Taylor, K. T.
1995-08-01
We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
A Physiologically Based, Multi-Scale Model of Skeletal Muscle Structure and Function
Röhrle, O.; Davidson, J. B.; Pullan, A. J.
2012-01-01
Models of skeletal muscle can be classified as phenomenological or biophysical. Phenomenological models predict the muscle’s response to a specified input based on experimental measurements. Prominent phenomenological models are the Hill-type muscle models, which have been incorporated into rigid-body modeling frameworks, and three-dimensional continuum-mechanical models. Biophysically based models attempt to predict the muscle’s response as emerging from the underlying physiology of the system. In this contribution, the conventional biophysically based modeling methodology is extended to include several structural and functional characteristics of skeletal muscle. The result is a physiologically based, multi-scale skeletal muscle finite element model that is capable of representing detailed, geometrical descriptions of skeletal muscle fibers and their grouping. Together with a well-established model of motor-unit recruitment, the electro-physiological behavior of single muscle fibers within motor units is computed and linked to a continuum-mechanical constitutive law. The bridging between the cellular level and the organ level has been achieved via a multi-scale constitutive law and homogenization. The effect of homogenization has been investigated by varying the number of embedded skeletal muscle fibers and/or motor units and computing the resulting exerted muscle forces while applying the same excitatory input. All simulations were conducted using an anatomically realistic finite element model of the tibialis anterior muscle. Given the fact that the underlying electro-physiological cellular muscle model is capable of modeling metabolic fatigue effects such as potassium accumulation in the T-tubular space and inorganic phosphate build-up, the proposed framework provides a novel simulation-based way to investigate muscle behavior ranging from motor-unit recruitment to force generation and fatigue. PMID:22993509
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
NASA Astrophysics Data System (ADS)
Shih, D.; Yeh, G.
2009-12-01
This paper applies two numerical approximations, the particle tracking technique and Galerkin finite element method, to solve the diffusive wave equation in both one-dimensional and two-dimensional flow simulations. The finite element method is one of most commonly approaches in numerical problems. It can obtain accurate solutions, but calculation times may be rather extensive. The particle tracking technique, using either single-velocity or average-velocity tracks to efficiently perform advective transport, could use larger time-step sizes than the finite element method to significantly save computational time. Comparisons of the alternative approximations are examined in this poster. We adapt the model WASH123D to examine the work. WASH123D is an integrated multimedia, multi-processes, physics-based computational model suitable for various spatial-temporal scales, was first developed by Yeh et al., at 1998. The model has evolved in design capability and flexibility, and has been used for model calibrations and validations over the course of many years. In order to deliver a locally hydrological model in Taiwan, the Taiwan Typhoon and Flood Research Institute (TTFRI) is working with Prof. Yeh to develop next version of WASH123D. So, the work of our preliminary cooperationx is also sketched in this poster.
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
Extended MHD modeling of tearing-driven magnetic relaxation
NASA Astrophysics Data System (ADS)
Sauppe, J. P.; Sovinec, C. R.
2017-05-01
Discrete relaxation events in reversed-field pinch relevant configurations are investigated numerically with nonlinear extended magnetohydrodynamic (MHD) modeling, including the Hall term in Ohm's law and first-order ion finite Larmor radius effects. Results show variability among relaxation events, where the Hall dynamo effect may help or impede the MHD dynamo effect in relaxing the parallel current density profile. The competitive behavior arises from multi-helicity conditions where the dominant magnetic fluctuation is relatively small. The resulting changes in parallel current density and parallel flow are aligned in the core, consistent with experimental observations. The analysis of simulation results also confirms that the force density from fluctuation-induced Reynolds stress arises subsequent to the drive from the fluctuation-induced Lorentz force density. Transport of the momentum density is found to be dominated by the fluctuation-induced Maxwell stress over most of the cross section with viscous and gyroviscous contributions being large in the edge region. The findings resolve a discrepancy with respect to the relative orientation of current density and flow relaxation, which had not been realized or investigated in King et al. [Phys. Plasmas 19, 055905 (2012)], where only the magnitude of flow relaxation is actually consistent with experimental results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell)more » approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.« less
New Parallel Algorithms for Landscape Evolution Model
NASA Astrophysics Data System (ADS)
Jin, Y.; Zhang, H.; Shi, Y.
2017-12-01
Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads
NASA Astrophysics Data System (ADS)
Kosakowski, G.; Kulik, D. A.; Shao, H.
2012-04-01
OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite element nodes with calls to GEMS3K and consecutive calculations of changed material parameters. In a first step the existing MPI implementation was utilized to parallelize this loop. Calculations were split between the MPI processes and afterwards data was synchronized by using MPI communication routines. Furthermore, multi-threaded calculation of the loop was implemented with help of the boost thread library (http://www.boost.org). This implementation provides a flexible environment to distribute calculations between several threads. For each MPI process at least one and up to several dozens of worker threads are spawned. These threads do not replicate the complete OGS-GEM data structure and use only a limited amount of memory. Calculation of the central geochemical loop is shared between all threads. Synchronization between the threads is done by barrier commands. The overall number of local threads times MPI processes should match the number of available computing nodes. The combination of multi-threading and MPI provides an effective and flexible environment to speed up OGS-GEMS calculations while limiting the required memory use. Test calculations on different hardware show that for certain types of applications tremendous speedups are possible.
NASA Astrophysics Data System (ADS)
Sizov, Gennadi Y.
In this dissertation, a model-based multi-objective optimal design of permanent magnet ac machines, supplied by sine-wave current regulated drives, is developed and implemented. The design procedure uses an efficient electromagnetic finite element-based solver to accurately model nonlinear material properties and complex geometric shapes associated with magnetic circuit design. Application of an electromagnetic finite element-based solver allows for accurate computation of intricate performance parameters and characteristics. The first contribution of this dissertation is the development of a rapid computational method that allows accurate and efficient exploration of large multi-dimensional design spaces in search of optimum design(s). The computationally efficient finite element-based approach developed in this work provides a framework of tools that allow rapid analysis of synchronous electric machines operating under steady-state conditions. In the developed modeling approach, major steady-state performance parameters such as, winding flux linkages and voltages, average, cogging and ripple torques, stator core flux densities, core losses, efficiencies and saturated machine winding inductances, are calculated with minimum computational effort. In addition, the method includes means for rapid estimation of distributed stator forces and three-dimensional effects of stator and/or rotor skew on the performance of the machine. The second contribution of this dissertation is the development of the design synthesis and optimization method based on a differential evolution algorithm. The approach relies on the developed finite element-based modeling method for electromagnetic analysis and is able to tackle large-scale multi-objective design problems using modest computational resources. Overall, computational time savings of up to two orders of magnitude are achievable, when compared to current and prevalent state-of-the-art methods. These computational savings allow one to expand the optimization problem to achieve more complex and comprehensive design objectives. The method is used in the design process of several interior permanent magnet industrial motors. The presented case studies demonstrate that the developed finite element-based approach practically eliminates the need for using less accurate analytical and lumped parameter equivalent circuit models for electric machine design optimization. The design process and experimental validation of the case-study machines are detailed in the dissertation.
NASA Technical Reports Server (NTRS)
Wang, P.; Li, P.
1998-01-01
A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.
Accurate traveltime computation in complex anisotropic media with discontinuous Galerkin method
NASA Astrophysics Data System (ADS)
Le Bouteiller, P.; Benjemaa, M.; Métivier, L.; Virieux, J.
2017-12-01
Travel time computation is of major interest for a large range of geophysical applications, among which source localization and characterization, phase identification, data windowing and tomography, from decametric scale up to global Earth scale.Ray-tracing tools, being essentially 1D Lagrangian integration along a path, have been used for their efficiency but present some drawbacks, such as a rather difficult control of the medium sampling. Moreover, they do not provide answers in shadow zones. Eikonal solvers, based on an Eulerian approach, have attracted attention in seismology with the pioneering work of Vidale (1988), while such approach has been proposed earlier by Riznichenko (1946). They have been used now for first-arrival travel-time tomography at various scales (Podvin & Lecomte (1991). The framework for solving this non-linear partial differential equation is now well understood and various finite-difference approaches have been proposed, essentially for smooth media. We propose a novel finite element approach which builds a precise solution for strongly heterogeneous anisotropic medium (still in the limit of Eikonal validity). The discontinuous Galerkin method we have developed allows local refinement of the mesh and local high orders of interpolation inside elements. High precision of the travel times and its spatial derivatives is obtained through this formulation. This finite element method also honors boundary conditions, such as complex topographies and absorbing boundaries for mimicking an infinite medium. Applications from travel-time tomography, slope tomography are expected, but also for migration and take-off angles estimation, thanks to the accuracy obtained when computing first-arrival times.References:Podvin, P. and Lecomte, I., 1991. Finite difference computation of traveltimes in very contrasted velocity model: a massively parallel approach and its associated tools, Geophys. J. Int., 105, 271-284.Riznichenko, Y., 1946. Geometrical seismics of layered media, Trudy Inst. Theor. Geophysics, Vol II, Moscow (in Russian).Vidale, J., 1988. Finite-difference calculation of travel times, Bull. seism. Soc. Am., 78, 2062-2076.
NASA Astrophysics Data System (ADS)
Chen, Lin; Zhang, Zhongjie; Song, Haibin
2013-12-01
The South China Sea is widely believed to have been opened by seafloor spreading during the Cenozoic. The details of its lithospheric extension are still being debated, and it is unknown whether pure, simple, or conjunct shears are responsible for the opening of the South China Sea. The depth-dependent and along-strike extension derived from the single-stage finite stretching model or instantaneous stretching model is inconsistent with the observation that the South China Sea proto-margins have experienced multi-episodic extension since the Late Cretaceous. Based on the multi-episodic finite stretching model, we present the amount of lithosphere stretching at the northern continental margin of the South China Sea for different depth scales (upper crust, whole crust and lithosphere) and along several transects. The stretching factors are estimated by integrating seven deep-penetration seismic profiles, the Moho distribution derived from gravity modeling, and the tectonic subsidence data for 41 wells. The results demonstrate that the amount of stretching increases rapidly from 1.1 at the continent shelf to over 3.5 at the lower slope, but the stretching factors at the crust and lithosphere scales are consistent within error (from the uncertainty in paleobathymetry and sea-level change). Furthermore, the along-strike variation in stretching factor is within the range of 1.11-1.9 in west-east direction, accompanied by significant west-east differences in the thickness of high-velocity layers (HVLs) within the lowermost crust. This weak along-strike variation of the stretching factor is most likely produced by the preexisting contrasts in the composition and thermal structure of the lithosphere. The above observations suggest that the continental extension in the opening of the South China Sea mainly takes the form of a uniform pure shear rather than depth-dependent stretching.
Transient Finite Element Computations on a Variable Transputer System
NASA Technical Reports Server (NTRS)
Smolinski, Patrick J.; Lapczyk, Ireneusz
1993-01-01
A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Partition of unity finite element method for quantum mechanical materials calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pask, J. E.; Sukumar, N.
The current state of the art for large-scale quantum-mechanical simulations is the planewave (PW) pseudopotential method, as implemented in codes such as VASP, ABINIT, and many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points in space, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires significant nonlocal communications, which limit parallel efficiency. Real-space methods such as finite-differences (FD) and finite-elements (FE) have partially addressed both resolution and parallel-communications issues but have been plagued by one key disadvantage relative tomore » PW: excessive number of degrees of freedom (basis functions) needed to achieve the required accuracies. In this paper, we present a real-space partition of unity finite element (PUFE) method to solve the Kohn–Sham equations of density functional theory. In the PUFE method, we build the known atomic physics into the solution process using partition-of-unity enrichment techniques in finite element analysis. The method developed herein is completely general, applicable to metals and insulators alike, and particularly efficient for deep, localized potentials, as occur in calculations at extreme conditions of pressure and temperature. Full self-consistent Kohn–Sham calculations are presented for LiH, involving light atoms, and CeAl, involving heavy atoms with large numbers of atomic-orbital enrichments. We find that the new PUFE approach attains the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the PW method. As a result, we compute the equation of state of LiH and show that the computed lattice constant and bulk modulus are in excellent agreement with reference PW results, while requiring an order of magnitude fewer degrees of freedom to obtain.« less
Partition of unity finite element method for quantum mechanical materials calculations
Pask, J. E.; Sukumar, N.
2016-11-09
The current state of the art for large-scale quantum-mechanical simulations is the planewave (PW) pseudopotential method, as implemented in codes such as VASP, ABINIT, and many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points in space, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires significant nonlocal communications, which limit parallel efficiency. Real-space methods such as finite-differences (FD) and finite-elements (FE) have partially addressed both resolution and parallel-communications issues but have been plagued by one key disadvantage relative tomore » PW: excessive number of degrees of freedom (basis functions) needed to achieve the required accuracies. In this paper, we present a real-space partition of unity finite element (PUFE) method to solve the Kohn–Sham equations of density functional theory. In the PUFE method, we build the known atomic physics into the solution process using partition-of-unity enrichment techniques in finite element analysis. The method developed herein is completely general, applicable to metals and insulators alike, and particularly efficient for deep, localized potentials, as occur in calculations at extreme conditions of pressure and temperature. Full self-consistent Kohn–Sham calculations are presented for LiH, involving light atoms, and CeAl, involving heavy atoms with large numbers of atomic-orbital enrichments. We find that the new PUFE approach attains the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the PW method. As a result, we compute the equation of state of LiH and show that the computed lattice constant and bulk modulus are in excellent agreement with reference PW results, while requiring an order of magnitude fewer degrees of freedom to obtain.« less
Finite-time consensus for controlled dynamical systems in network
NASA Astrophysics Data System (ADS)
Zoghlami, Naim; Mlayeh, Rhouma; Beji, Lotfi; Abichou, Azgal
2018-04-01
The key challenges in networked dynamical systems are the component heterogeneities, nonlinearities, and the high dimension of the formulated vector of state variables. In this paper, the emphasise is put on two classes of systems in network include most controlled driftless systems as well as systems with drift. For each model structure that defines homogeneous and heterogeneous multi-system behaviour, we derive protocols leading to finite-time consensus. For each model evolving in networks forming a homogeneous or heterogeneous multi-system, protocols integrating sufficient conditions are derived leading to finite-time consensus. Likewise, for the networking topology, we make use of fixed directed and undirected graphs. To prove our approaches, finite-time stability theory and Lyapunov methods are considered. As illustrative examples, the homogeneous multi-unicycle kinematics and the homogeneous/heterogeneous multi-second order dynamics in networks are studied.
ERIC Educational Resources Information Center
Martin, Andrew J.
2009-01-01
This investigation conducts measurement and evaluation of a multidimensional model of workplace motivation and engagement from a construct validation perspective. Two studies were conducted, one using the multi-item multidimensional Motivation and Engagement Scale-Work (N = 637 school personnel) and one using a parallel short form (N = 574 school…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abbas, M., E-mail: micheline.abbas@ensiacet.fr; CNRS, Fédération de recherche FERMaT, CNRS, 31400, Toulouse; Magaud, P.
2014-12-15
The migration of neutrally buoyant finite sized particles in a Newtonian square channel flow is investigated in the limit of very low solid volumetric concentration, within a wide range of channel Reynolds numbers Re = [0.07-120]. In situ microscope measurements of particle distributions, taken far from the channel inlet (at a distance several thousand times the channel height), revealed that particles are preferentially located near the channel walls at Re > 10 and near the channel center at Re < 1. Whereas the cross-streamline particle motion is governed by inertia-induced lift forces at high inertia, it seems to be controlledmore » by shear-induced particle interactions at low (but finite) Reynolds numbers, despite the low solid volume fraction (<1%). The transition between both regimes is observed in the range Re = [1-10]. In order to exclude the effect of multi-body interactions, the trajectories of single freely moving particles are calculated thanks to numerical simulations based on the force coupling method. With the deployed numerical tool, the complete particle trajectories are accessible within a reasonable computational time only in the inertial regime (Re > 10). In this regime, we show that (i) the particle undergoes cross-streamline migration followed by a cross-lateral migration (parallel to the wall) in agreement with previous observations, and (ii) the stable equilibrium positions are located at the midline of the channel faces while the diagonal equilibrium positions are unstable. At low flow inertia, the first instants of the numerical simulations (carried at Re = O(1)) reveal that the cross-streamline migration of a single particle is oriented towards the channel wall, suggesting that the particle preferential positions around the channel center, observed in the experiments, are rather due to multi-body interactions.« less
Kerckhoffs, Roy C. P.; Neal, Maxwell L.; Gu, Quan; Bassingthwaighte, James B.; Omens, Jeff H.; McCulloch, Andrew D.
2010-01-01
In this study we present a novel, robust method to couple finite element (FE) models of cardiac mechanics to systems models of the circulation (CIRC), independent of cardiac phase. For each time step through a cardiac cycle, left and right ventricular pressures were calculated using ventricular compliances from the FE and CIRC models. These pressures served as boundary conditions in the FE and CIRC models. In succeeding steps, pressures were updated to minimize cavity volume error (FE minus CIRC volume) using Newton iterations. Coupling was achieved when a predefined criterion for the volume error was satisfied. Initial conditions for the multi-scale model were obtained by replacing the FE model with a varying elastance model, which takes into account direct ventricular interactions. Applying the coupling, a novel multi-scale model of the canine cardiovascular system was developed. Global hemodynamics and regional mechanics were calculated for multiple beats in two separate simulations with a left ventricular ischemic region and pulmonary artery constriction, respectively. After the interventions, global hemodynamics changed due to direct and indirect ventricular interactions, in agreement with previously published experimental results. The coupling method allows for simulations of multiple cardiac cycles for normal and pathophysiology, encompassing levels from cell to system. PMID:17111210
Simulation of finite-strain inelastic phenomena governed by creep and plasticity
NASA Astrophysics Data System (ADS)
Li, Zhen; Bloomfield, Max O.; Oberai, Assad A.
2017-11-01
Inelastic mechanical behavior plays an important role in many applications in science and engineering. Phenomenologically, this behavior is often modeled as plasticity or creep. Plasticity is used to represent the rate-independent component of inelastic deformation and creep is used to represent the rate-dependent component. In several applications, especially those at elevated temperatures and stresses, these processes occur simultaneously. In order to model these process, we develop a rate-objective, finite-deformation constitutive model for plasticity and creep. The plastic component of this model is based on rate-independent J_2 plasticity, and the creep component is based on a thermally activated Norton model. We describe the implementation of this model within a finite element formulation, and present a radial return mapping algorithm for it. This approach reduces the additional complexity of modeling plasticity and creep, over thermoelasticity, to just solving one nonlinear scalar equation at each quadrature point. We implement this algorithm within a multiphysics finite element code and evaluate the consistent tangent through automatic differentiation. We verify and validate the implementation, apply it to modeling the evolution of stresses in the flip chip manufacturing process, and test its parallel strong-scaling performance.
Comparing multi-module connections in membrane chromatography scale-up.
Yu, Zhou; Karkaria, Tishtar; Espina, Marianela; Hunjun, Manjeet; Surendran, Abera; Luu, Tina; Telychko, Julia; Yang, Yan-Ping
2015-07-20
Membrane chromatography is increasingly used for protein purification in the biopharmaceutical industry. Membrane adsorbers are often pre-assembled by manufacturers as ready-to-use modules. In large-scale protein manufacturing settings, the use of multiple membrane modules for a single batch is often required due to the large quantity of feed material. The question as to how multiple modules can be connected to achieve optimum separation and productivity has been previously approached using model proteins and mass transport theories. In this study, we compare the performance of multiple membrane modules in series and in parallel in the production of a protein antigen. Series connection was shown to provide superior separation compared to parallel connection in the context of competitive adsorption. Copyright © 2015 Elsevier B.V. All rights reserved.
Distributed robust finite-time nonlinear consensus protocols for multi-agent systems
NASA Astrophysics Data System (ADS)
Zuo, Zongyu; Tie, Lin
2016-04-01
This paper investigates the robust finite-time consensus problem of multi-agent systems in networks with undirected topology. Global nonlinear consensus protocols augmented with a variable structure are constructed with the aid of Lyapunov functions for each single-integrator agent dynamics in the presence of external disturbances. In particular, it is shown that the finite settling time of the proposed general framework for robust consensus design is upper bounded for any initial condition. This makes it possible for network consensus problems to design and estimate the convergence time offline for a multi-agent team with a given undirected information flow. Finally, simulation results are presented to demonstrate the performance and effectiveness of our finite-time protocols.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kucharik, M.; Scovazzi, Guglielmo; Shashkov, Mikhail Jurievich
Hourglassing is a well-known pathological numerical artifact affecting the robustness and accuracy of Lagrangian methods. There exist a large number of hourglass control/suppression strategies. In the community of the staggered compatible Lagrangian methods, the approach of sub-zonal pressure forces is among the most widely used. However, this approach is known to add numerical strength to the solution, which can cause potential problems in certain types of simulations, for instance in simulations of various instabilities. To avoid this complication, we have adapted the multi-scale residual-based stabilization typically used in the finite element approach for staggered compatible framework. In this study, wemore » describe two discretizations of the new approach and demonstrate their properties and compare with the method of sub-zonal pressure forces on selected numerical problems.« less
Kucharik, M.; Scovazzi, Guglielmo; Shashkov, Mikhail Jurievich; ...
2017-10-28
Hourglassing is a well-known pathological numerical artifact affecting the robustness and accuracy of Lagrangian methods. There exist a large number of hourglass control/suppression strategies. In the community of the staggered compatible Lagrangian methods, the approach of sub-zonal pressure forces is among the most widely used. However, this approach is known to add numerical strength to the solution, which can cause potential problems in certain types of simulations, for instance in simulations of various instabilities. To avoid this complication, we have adapted the multi-scale residual-based stabilization typically used in the finite element approach for staggered compatible framework. In this study, wemore » describe two discretizations of the new approach and demonstrate their properties and compare with the method of sub-zonal pressure forces on selected numerical problems.« less
NASA Astrophysics Data System (ADS)
Asgari, Somayyeh; Ghattan Kashani, Zahra; Granpayeh, Nosrat
2018-04-01
The performances of three optical devices including a refractive index sensor, a power splitter, and a 4-channel multi/demultiplexer based on graphene cylindrical resonators are proposed, analyzed, and simulated numerically by using the finite-difference time-domain method. The proposed sensor operates on the principle of the shift in resonance wavelength with a change in the refractive index of dielectric materials. The sensor sensitivity has been numerically derived. In addition, the performances of the power splitter and the multi/demultiplexer based on the variation of the resonance wavelengths of cylindrical resonator have been thoroughly investigated. The simulation results are in good agreement with the theoretical ones. Our studies demonstrate that the graphene based ultra-compact, nano-scale devices can be improved to be used as photonic integrated devices, optical switching, and logic gates.
Philip, Bobby; Berrill, Mark A.; Allu, Srikanth; ...
2015-01-26
We describe an efficient and nonlinearly consistent parallel solution methodology for solving coupled nonlinear thermal transport problems that occur in nuclear reactor applications over hundreds of individual 3D physical subdomains. Efficiency is obtained by leveraging knowledge of the physical domains, the physics on individual domains, and the couplings between them for preconditioning within a Jacobian Free Newton Krylov method. Details of the computational infrastructure that enabled this work, namely the open source Advanced Multi-Physics (AMP) package developed by the authors are described. The details of verification and validation experiments, and parallel performance analysis in weak and strong scaling studies demonstratingmore » the achieved efficiency of the algorithm are presented. Moreover, numerical experiments demonstrate that the preconditioner developed is independent of the number of fuel subdomains in a fuel rod, which is particularly important when simulating different types of fuel rods. Finally, we demonstrate the power of the coupling methodology by considering problems with couplings between surface and volume physics and coupling of nonlinear thermal transport in fuel rods to an external radiation transport code.« less
A new technique for simulating composite material
NASA Technical Reports Server (NTRS)
Volakis, John L.
1991-01-01
This project dealt with the development on new methodologies and algorithms for the multi-spectrum electromagnetic characterization of large scale nonmetallic airborne vehicles and structures. A robust, low memory, and accurate methodology was developed which is particularly suited for modern machine architectures. This is a hybrid finite element method that combines two well known numerical solution approaches. That of the finite element method for modeling volumes and the boundary integral method which yields exact boundary conditions for terminating the finite element mesh. In addition, a variety of high frequency results were generated (such as diffraction coefficients for impedance surfaces and material layers) and a class of boundary conditions were developed which hold promise for more efficient simulations. During the course of this project, nearly 25 detailed research reports were generated along with an equal number of journal papers. The reports, papers, and journal articles are listed in the appendices along with their abstracts.
Summary Report of Working Group 2: Computation
NASA Astrophysics Data System (ADS)
Stoltz, P. H.; Tsung, R. S.
2009-01-01
The working group on computation addressed three physics areas: (i) plasma-based accelerators (laser-driven and beam-driven), (ii) high gradient structure-based accelerators, and (iii) electron beam sources and transport [1]. Highlights of the talks in these areas included new models of breakdown on the microscopic scale, new three-dimensional multipacting calculations with both finite difference and finite element codes, and detailed comparisons of new electron gun models with standard models such as PARMELA. The group also addressed two areas of advances in computation: (i) new algorithms, including simulation in a Lorentz-boosted frame that can reduce computation time orders of magnitude, and (ii) new hardware architectures, like graphics processing units and Cell processors that promise dramatic increases in computing power. Highlights of the talks in these areas included results from the first large-scale parallel finite element particle-in-cell code (PIC), many order-of-magnitude speedup of, and details of porting the VPIC code to the Roadrunner supercomputer. The working group featured two plenary talks, one by Brian Albright of Los Alamos National Laboratory on the performance of the VPIC code on the Roadrunner supercomputer, and one by David Bruhwiler of Tech-X Corporation on recent advances in computation for advanced accelerators. Highlights of the talk by Albright included the first one trillion particle simulations, a sustained performance of 0.3 petaflops, and an eight times speedup of science calculations, including back-scatter in laser-plasma interaction. Highlights of the talk by Bruhwiler included simulations of 10 GeV accelerator laser wakefield stages including external injection, new developments in electromagnetic simulations of electron guns using finite difference and finite element approaches.
Summary Report of Working Group 2: Computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoltz, P. H.; Tsung, R. S.
2009-01-22
The working group on computation addressed three physics areas: (i) plasma-based accelerators (laser-driven and beam-driven), (ii) high gradient structure-based accelerators, and (iii) electron beam sources and transport [1]. Highlights of the talks in these areas included new models of breakdown on the microscopic scale, new three-dimensional multipacting calculations with both finite difference and finite element codes, and detailed comparisons of new electron gun models with standard models such as PARMELA. The group also addressed two areas of advances in computation: (i) new algorithms, including simulation in a Lorentz-boosted frame that can reduce computation time orders of magnitude, and (ii) newmore » hardware architectures, like graphics processing units and Cell processors that promise dramatic increases in computing power. Highlights of the talks in these areas included results from the first large-scale parallel finite element particle-in-cell code (PIC), many order-of-magnitude speedup of, and details of porting the VPIC code to the Roadrunner supercomputer. The working group featured two plenary talks, one by Brian Albright of Los Alamos National Laboratory on the performance of the VPIC code on the Roadrunner supercomputer, and one by David Bruhwiler of Tech-X Corporation on recent advances in computation for advanced accelerators. Highlights of the talk by Albright included the first one trillion particle simulations, a sustained performance of 0.3 petaflops, and an eight times speedup of science calculations, including back-scatter in laser-plasma interaction. Highlights of the talk by Bruhwiler included simulations of 10 GeV accelerator laser wakefield stages including external injection, new developments in electromagnetic simulations of electron guns using finite difference and finite element approaches.« less
NASA Astrophysics Data System (ADS)
Farengo, R.; Guzdar, P. N.; Lee, Y. C.
1989-08-01
The effect of finite parallel wavenumber and electron temperature gradients on the lower hybrid drift instability is studied in the parameter regime corresponding to the TRX-2 device [Fusion Technol. 9, 48 (1986)]. Perturbations in the electrostatic potential and all three components of the vector potential are considered and finite beta electron orbit modifications are included. The electron temperature gradient decreases the growth rate of the instability but, for kz=0, unstable modes exist for ηe(=T'en0/Ten0)>6. Since finite kz effects completely stabilize the mode at small values of kz/ky(≂5×10-3), magnetic shear could be responsible for stabilizing the lower hybrid drift instability in field-reversed configurations.
Optimal mapping of irregular finite element domains to parallel processors
NASA Technical Reports Server (NTRS)
Flower, J.; Otto, S.; Salama, M.
1987-01-01
Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.
NASA Astrophysics Data System (ADS)
Maneva, Yana; Poedts, Stefaan
2017-04-01
The electromagnetic fluctuations in the solar wind represent a zoo of plasma waves with different properties, whose wavelengths range from largest fluid scales to the smallest dissipation scales. By nature the power spectrum of the magnetic fluctuations is anisotropic with different spectral slopes in parallel and perpendicular directions with respect to the background magnetic field. Furthermore, the magnetic field power spectra steepen as one moves from the inertial to the dissipation range and we observe multiple spectral breaks with different slopes in parallel and perpendicular direction at the ion scales and beyond. The turbulent dissipation of magnetic field fluctuations at the sub-ion scales is believed to go into local ion heating and acceleration, so that the spectral breaks are typically associated with particle energization. The gained energy can be in the form of anisotropic heating, formation of non-thermal features in the particle velocity distributions functions, and redistribution of the differential acceleration between the different ion populations. To study the relation between the evolution of the anisotropic turbulent spectra and the particle heating at the ion and sub-ion scales we perform a series of 2.5D hybrid simulations in a collisionless drifting proton-alpha plasma. We neglect the fast electron dynamics and treat the electrons as an isothermal fluid electrons, whereas the protons and a minor population of alpha particles are evolved in a fully kinetic manner. We start with a given wave spectrum and study the evolution of the magnetic field spectral slopes as a function of the parallel and perpendicular wave¬numbers. Simultaneously, we track the particle response and the energy exchange between the parallel and perpendicular scales. We observe anisotropic behavior of the turbulent power spectra with steeper slopes along the dominant energy-containing direction. This means that for parallel and quasi-parallel waves we have steeper spectral slope in parallel direction, whereas for highly oblique waves the dissipation occurs predominantly in perpendicular direction and the spectral slopes are steeper across the background magnetic field. The value of the spectral slopes depends on the angle of propagation, the spectral range, as well as the plasma properties. In general the dissipation is stronger at small scales and the corresponding spectral slopes there are steeper. For parallel and quasi-parallel propagation the prevailing energy cascade remains along the magnetic field, whereas for initially isotropic oblique turbulence the cascade develops mainly in perpendicular direction.
Finite-Difference Algorithm for Simulating 3D Electromagnetic Wavefields in Conductive Media
NASA Astrophysics Data System (ADS)
Aldridge, D. F.; Bartel, L. C.; Knox, H. A.
2013-12-01
Electromagnetic (EM) wavefields are routinely used in geophysical exploration for detection and characterization of subsurface geological formations of economic interest. Recorded EM signals depend strongly on the current conductivity of geologic media. Hence, they are particularly useful for inferring fluid content of saturated porous bodies. In order to enhance understanding of field-recorded data, we are developing a numerical algorithm for simulating three-dimensional (3D) EM wave propagation and diffusion in heterogeneous conductive materials. Maxwell's equations are combined with isotropic constitutive relations to obtain a set of six, coupled, first-order partial differential equations governing the electric and magnetic vectors. An advantage of this system is that it does not contain spatial derivatives of the three medium parameters electric permittivity, magnetic permeability, and current conductivity. Numerical solution methodology consists of explicit, time-domain finite-differencing on a 3D staggered rectangular grid. Temporal and spatial FD operators have order 2 and N, where N is user-selectable. We use an artificially-large electric permittivity to maximize the FD timestep, and thus reduce execution time. For the low frequencies typically used in geophysical exploration, accuracy is not unduly compromised. Grid boundary reflections are mitigated via convolutional perfectly matched layers (C-PMLs) imposed at the six grid flanks. A shared-memory-parallel code implementation via OpenMP directives enables rapid algorithm execution on a multi-thread computational platform. Good agreement is obtained in comparisons of numerically-generated data with reference solutions. EM wavefields are sourced via point current density and magnetic dipole vectors. Spatially-extended inductive sources (current carrying wire loops) are under development. We are particularly interested in accurate representation of high-conductivity sub-grid-scale features that are common in industrial environments (borehole casing, pipes, railroad tracks). Present efforts are oriented toward calculating the EM responses of these objects via a First Born Approximation approach. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
MARE2DEM: a 2-D inversion code for controlled-source electromagnetic and magnetotelluric data
NASA Astrophysics Data System (ADS)
Key, Kerry
2016-10-01
This work presents MARE2DEM, a freely available code for 2-D anisotropic inversion of magnetotelluric (MT) data and frequency-domain controlled-source electromagnetic (CSEM) data from onshore and offshore surveys. MARE2DEM parametrizes the inverse model using a grid of arbitrarily shaped polygons, where unstructured triangular or quadrilateral grids are typically used due to their ease of construction. Unstructured grids provide significantly more geometric flexibility and parameter efficiency than the structured rectangular grids commonly used by most other inversion codes. Transmitter and receiver components located on topographic slopes can be tilted parallel to the boundary so that the simulated electromagnetic fields accurately reproduce the real survey geometry. The forward solution is implemented with a goal-oriented adaptive finite-element method that automatically generates and refines unstructured triangular element grids that conform to the inversion parameter grid, ensuring accurate responses as the model conductivity changes. This dual-grid approach is significantly more efficient than the conventional use of a single grid for both the forward and inverse meshes since the more detailed finite-element meshes required for accurate responses do not increase the memory requirements of the inverse problem. Forward solutions are computed in parallel with a highly efficient scaling by partitioning the data into smaller independent modeling tasks consisting of subsets of the input frequencies, transmitters and receivers. Non-linear inversion is carried out with a new Occam inversion approach that requires fewer forward calls. Dense matrix operations are optimized for memory and parallel scalability using the ScaLAPACK parallel library. Free parameters can be bounded using a new non-linear transformation that leaves the transformed parameters nearly the same as the original parameters within the bounds, thereby reducing non-linear smoothing effects. Data balancing normalization weights for the joint inversion of two or more data sets encourages the inversion to fit each data type equally well. A synthetic joint inversion of marine CSEM and MT data illustrates the algorithm's performance and parallel scaling on up to 480 processing cores. CSEM inversion of data from the Middle America Trench offshore Nicaragua demonstrates a real world application. The source code and MATLAB interface tools are freely available at http://mare2dem.ucsd.edu.
NASA Astrophysics Data System (ADS)
Hill, Peter; Shanahan, Brendan; Dudson, Ben
2017-04-01
We present a technique for handling Dirichlet boundary conditions with the Flux Coordinate Independent (FCI) parallel derivative operator with arbitrary-shaped material geometry in general 3D magnetic fields. The FCI method constructs a finite difference scheme for ∇∥ by following field lines between poloidal planes and interpolating within planes. Doing so removes the need for field-aligned coordinate systems that suffer from singularities in the metric tensor at null points in the magnetic field (or equivalently, when q → ∞). One cost of this method is that as the field lines are not on the mesh, they may leave the domain at any point between neighbouring planes, complicating the application of boundary conditions. The Leg Value Fill (LVF) boundary condition scheme presented here involves an extrapolation/interpolation of the boundary value onto the field line end point. The usual finite difference scheme can then be used unmodified. We implement the LVF scheme in BOUT++ and use the Method of Manufactured Solutions to verify the implementation in a rectangular domain, and show that it does not modify the error scaling of the finite difference scheme. The use of LVF for arbitrary wall geometry is outlined. We also demonstrate the feasibility of using the FCI approach in no n-axisymmetric configurations for a simple diffusion model in a "straight stellarator" magnetic field. A Gaussian blob diffuses along the field lines, tracing out flux surfaces. Dirichlet boundary conditions impose a last closed flux surface (LCFS) that confines the density. Including a poloidal limiter moves the LCFS to a smaller radius. The expected scaling of the numerical perpendicular diffusion, which is a consequence of the FCI method, in stellarator-like geometry is recovered. A novel technique for increasing the parallel resolution during post-processing, in order to reduce artefacts in visualisations, is described.
IGA-ADS: Isogeometric analysis FEM using ADS solver
NASA Astrophysics Data System (ADS)
Łoś, Marcin M.; Woźniak, Maciej; Paszyński, Maciej; Lenharth, Andrew; Hassaan, Muhamm Amber; Pingali, Keshav
2017-08-01
In this paper we present a fast explicit solver for solution of non-stationary problems using L2 projections with isogeometric finite element method. The solver has been implemented within GALOIS framework. It enables parallel multi-core simulations of different time-dependent problems, in 1D, 2D, or 3D. We have prepared the solver framework in a way that enables direct implementation of the selected PDE and corresponding boundary conditions. In this paper we describe the installation, implementation of exemplary three PDEs, and execution of the simulations on multi-core Linux cluster nodes. We consider three case studies, including heat transfer, linear elasticity, as well as non-linear flow in heterogeneous media. The presented package generates output suitable for interfacing with Gnuplot and ParaView visualization software. The exemplary simulations show near perfect scalability on Gilbert shared-memory node with four Intel® Xeon® CPU E7-4860 processors, each possessing 10 physical cores (for a total of 40 cores).
NASA Technical Reports Server (NTRS)
Shu, Chi-Wang
1992-01-01
The nonlinear stability of compact schemes for shock calculations is investigated. In recent years compact schemes were used in various numerical simulations including direct numerical simulation of turbulence. However to apply them to problems containing shocks, one has to resolve the problem of spurious numerical oscillation and nonlinear instability. A framework to apply nonlinear limiting to a local mean is introduced. The resulting scheme can be proven total variation (1D) or maximum norm (multi D) stable and produces nice numerical results in the test cases. The result is summarized in the preprint entitled 'Nonlinearly Stable Compact Schemes for Shock Calculations', which was submitted to SIAM Journal on Numerical Analysis. Research was continued on issues related to two and three dimensional essentially non-oscillatory (ENO) schemes. The main research topics include: parallel implementation of ENO schemes on Connection Machines; boundary conditions; shock interaction with hydrogen bubbles, a preparation for the full combustion simulation; and direct numerical simulation of compressible sheared turbulence.
A method of boundary equations for unsteady hyperbolic problems in 3D
NASA Astrophysics Data System (ADS)
Petropavlovsky, S.; Tsynkov, S.; Turkel, E.
2018-07-01
We consider interior and exterior initial boundary value problems for the three-dimensional wave (d'Alembert) equation. First, we reduce a given problem to an equivalent operator equation with respect to unknown sources defined only at the boundary of the original domain. In doing so, the Huygens' principle enables us to obtain the operator equation in a form that involves only finite and non-increasing pre-history of the solution in time. Next, we discretize the resulting boundary equation and solve it efficiently by the method of difference potentials (MDP). The overall numerical algorithm handles boundaries of general shape using regular structured grids with no deterioration of accuracy. For long simulation times it offers sub-linear complexity with respect to the grid dimension, i.e., is asymptotically cheaper than the cost of a typical explicit scheme. In addition, our algorithm allows one to share the computational cost between multiple similar problems. On multi-processor (multi-core) platforms, it benefits from what can be considered an effective parallelization in time.
Transient Solid Dynamics Simulations on the Sandia/Intel Teraflop Computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attaway, S.; Brown, K.; Gardner, D.
1997-12-31
Transient solid dynamics simulations are among the most widely used engineering calculations. Industrial applications include vehicle crashworthiness studies, metal forging, and powder compaction prior to sintering. These calculations are also critical to defense applications including safety studies and weapons simulations. The practical importance of these calculations and their computational intensiveness make them natural candidates for parallelization. This has proved to be difficult, and existing implementations fail to scale to more than a few dozen processors. In this paper we describe our parallelization of PRONTO, Sandia`s transient solid dynamics code, via a novel algorithmic approach that utilizes multiple decompositions for differentmore » key segments of the computations, including the material contact calculation. This latter calculation is notoriously difficult to perform well in parallel, because it involves dynamically changing geometry, global searches for elements in contact, and unstructured communications among the compute nodes. Our approach scales to at least 3600 compute nodes of the Sandia/Intel Teraflop computer (the largest set of nodes to which we have had access to date) on problems involving millions of finite elements. On this machine we can simulate models using more than ten- million elements in a few tenths of a second per timestep, and solve problems more than 3000 times faster than a single processor Cray Jedi.« less
NASA Technical Reports Server (NTRS)
Liu, Kuojuey Ray
1990-01-01
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
NASA Astrophysics Data System (ADS)
Nouri-Borujerdi, Ali; Moazezi, Arash
2018-01-01
The current study investigates the conjugate heat transfer characteristics for laminar flow in backward facing step channel. All of the channel walls are insulated except the lower thick wall under a constant temperature. The upper wall includes a insulated obstacle perpendicular to flow direction. The effect of obstacle height and location on the fluid flow and heat transfer are numerically explored for the Reynolds number in the range of 10 ≤ Re ≤ 300. Incompressible Navier-Stokes and thermal energy equations are solved simultaneously in fluid region by the upwind compact finite difference scheme based on flux-difference splitting in conjunction with artificial compressibility method. In the thick wall, the energy equation is obtained by Laplace equation. A multi-block approach is used to perform parallel computing to reduce the CPU time. Each block is modeled separately by sharing boundary conditions with neighbors. The developed program for modeling was written in FORTRAN language with OpenMP API. The obtained results showed that using of the multi-block parallel computing method is a simple robust scheme with high performance and high-order accurate. Moreover, the obtained results demonstrated that the increment of Reynolds number and obstacle height as well as decrement of horizontal distance between the obstacle and the step improve the heat transfer.
NASA Astrophysics Data System (ADS)
Sinha, Neeraj; Zambon, Andrea; Ott, James; Demagistris, Michael
2015-06-01
Driven by the continuing rapid advances in high-performance computing, multi-dimensional high-fidelity modeling is an increasingly reliable predictive tool capable of providing valuable physical insight into complex post-detonation reacting flow fields. Utilizing a series of test cases featuring blast waves interacting with combustible dispersed clouds in a small-scale test setup under well-controlled conditions, the predictive capabilities of a state-of-the-art code are demonstrated and validated. Leveraging physics-based, first principle models and solving large system of equations on highly-resolved grids, the combined effects of finite-rate/multi-phase chemical processes (including thermal ignition), turbulent mixing and shock interactions are captured across the spectrum of relevant time-scales and length scales. Since many scales of motion are generated in a post-detonation environment, even if the initial ambient conditions are quiescent, turbulent mixing plays a major role in the fireball afterburning as well as in dispersion, mixing, ignition and burn-out of combustible clouds in its vicinity. Validating these capabilities at the small scale is critical to establish a reliable predictive tool applicable to more complex and large-scale geometries of practical interest.
NASA Astrophysics Data System (ADS)
Iturrieta, Pablo Cristián; Hurtado, Daniel E.; Cembrano, José; Stanton-Yonge, Ashley
2017-09-01
Orogenic belts at oblique convergent subduction margins accommodate deformation in several trench-parallel domains, one of which is the magmatic arc, commonly regarded as taking up the margin-parallel, strike-slip component. However, the stress state and kinematics of volcanic arcs is more complex than usually recognized, involving first- and second-order faults with distinctive slip senses and mutual interaction. These are usually organized into regional scale strike-slip duplexes, associated with both long-term and short-term heterogeneous deformation and magmatic activity. This is the case of the 1100 km-long Liquiñe-Ofqui Fault System in the Southern Andes, made up of two overlapping margin-parallel master faults joined by several NE-striking second-order faults. We present a finite element model addressing the nature and spatial distribution of stress across and along the volcanic arc in the Southern Andes to understand slip partitioning and the connection between tectonics and magmatism, particularly during the interseismic phase of the subduction earthquake cycle. We correlate the dynamics of the strike-slip duplex with geological, seismic and magma transport evidence documented by previous work, showing consistency between the model and the inferred fault system behavior. Our results show that maximum principal stress orientations are heterogeneously distributed within the continental margin, ranging from 15° to 25° counter-clockwise (with respect to the convergence vector) in the master faults and 10-19° clockwise in the forearc and backarc domains. We calculate the stress tensor ellipticity, indicating simple shearing in the eastern master fault and transpressional stress in the western master fault. Subsidiary faults undergo transtensional-to-extensional stress states. The eastern master fault displays slip rates of 5 to 10 mm/yr, whereas the western and subsidiary faults show slips rates of 1 to 5 mm/yr. Our results endorse that favorably oriented subsidiary faults serve as magma pathways, particularly where they are close to the intersection with a master fault. Also, the slip of a fault segment is enhanced when an adjacent fault kinematics is superimposed on the regional tectonic loading. Hence, finite element models help to understand coupled tectonics and volcanic processes, demonstrating that geological and geophysical observations can be accounted for by a small number of key first order boundary conditions.
NASA Astrophysics Data System (ADS)
Montero, Marc Villa; Barjasteh, Ehsan; Baid, Harsh K.; Godines, Cody; Abdi, Frank; Nikbin, Kamran
A multi-scale micromechanics approach along with finite element (FE) model predictive tool is developed to analyze low-energy-impact damage footprint and compression-after-impact (CAI) of composite laminates which is also tested and verified with experimental data. Effective fiber and matrix properties were reverse-engineered from lamina properties using an optimization algorithm and used to assess damage at the micro-level during impact and post-impact FE simulations. Progressive failure dynamic analysis (PFDA) was performed for a two step-process simulation. Damage mechanisms at the micro-level were continuously evaluated during the analyses. Contribution of each failure mode was tracked during the simulations and damage and delamination footprint size and shape were predicted to understand when, where and why failure occurred during both impact and CAI events. The composite laminate was manufactured by the vacuum infusion of the aero-grade toughened Benzoxazine system into the fabric preform. Delamination footprint was measured using C-scan data from the impacted panels and compared with the predicated values obtained from proposed multi-scale micromechanics coupled with FE analysis. Furthermore, the residual strength was predicted from the load-displacement curve and compared with the experimental values as well.
NASA Astrophysics Data System (ADS)
Han, Zhenyu; Sun, Shouzheng; Fu, Yunzhong; Fu, Hongya
2017-10-01
Viscidity is an important physical indicator for assessing fluidity of resin that is beneficial to contact resin with the fibers effectively and reduce manufacturing defects during automated fiber placement (AFP) process. However, the effect of processing parameters on viscidity evolution is rarely studied during AFP process. In this paper, viscidities under different scales are analyzed based on multi-scale analysis method. Firstly, viscous dissipation energy (VDE) within meso-unit under different processing parameters is assessed by using finite element method (FEM). According to multi-scale energy transfer model, meso-unit energy is used as the boundary condition for microscopic analysis. Furthermore, molecular structure of micro-system is built by molecular dynamics (MD) method. And viscosity curves are then obtained by integrating stress autocorrelation function (SACF) with time. Finally, the correlation characteristics of processing parameters to viscosity are revealed by using gray relational analysis method (GRAM). A group of processing parameters is found out to achieve the stability of viscosity and better fluidity of resin.
Li, J; Guo, L-X; Zeng, H; Han, X-B
2009-06-01
A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers
ERIC Educational Resources Information Center
Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph
2015-01-01
In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
Continuous Easy-Plane Deconfined Phase Transition on the Kagome Lattice
NASA Astrophysics Data System (ADS)
Zhang, Xue-Feng; He, Yin-Chen; Eggert, Sebastian; Moessner, Roderich; Pollmann, Frank
2018-03-01
We use large scale quantum Monte Carlo simulations to study an extended Hubbard model of hard core bosons on the kagome lattice. In the limit of strong nearest-neighbor interactions at 1 /3 filling, the interplay between frustration and quantum fluctuations leads to a valence bond solid ground state. The system undergoes a quantum phase transition to a superfluid phase as the interaction strength is decreased. It is still under debate whether the transition is weakly first order or represents an unconventional continuous phase transition. We present a theory in terms of an easy plane noncompact C P1 gauge theory describing the phase transition at 1 /3 filling. Utilizing large scale quantum Monte Carlo simulations with parallel tempering in the canonical ensemble up to 15552 spins, we provide evidence that the phase transition is continuous at exactly 1 /3 filling. A careful finite size scaling analysis reveals an unconventional scaling behavior hinting at deconfined quantum criticality.
Reference Models for Multi-Layer Tissue Structures
2016-09-01
simulation, finite element analysis 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC...Physiologically realistic, fully specimen-specific, nonlinear reference models. Tasks. Finite element analysis of non-linear mechanics of cadaver...models. Tasks. Finite element analysis of non-linear mechanics of multi-layer tissue regions of human subjects. Deliverables. Partially subject- and
Seismic Modeling Of Reservoir Heterogeneity Scales: An Application To Gas Hydrate Reservoirs
NASA Astrophysics Data System (ADS)
Huang, J.; Bellefleur, G.; Milkereit, B.
2008-12-01
Natural gas hydrates, a type of inclusion compound or clathrate, are composed of gas molecules trapped within a cage of water molecules. The occurrence of gas hydrates in permafrost regions has been confirmed by core samples recovered from the Mallik gas hydrate research wells located within Mackenzie Delta in Northwest Territories of Canada. Strong vertical variations of compressional and shear sonic velocities and weak surface seismic expressions of gas hydrates indicate that lithological heterogeneities control the distribution of hydrates. Seismic scattering studies predict that typical scales and strong physical contrasts due to gas hydrate concentration will generate strong forward scattering, leaving only weak energy captured by surface receivers. In order to understand the distribution of hydrates and the seismic scattering effects, an algorithm was developed to construct heterogeneous petrophysical reservoir models. The algorithm was based on well logs showing power law features and Gaussian or Non-Gaussian probability density distribution, and was designed to honor the whole statistical features of well logs such as the characteristic scales and the correlation among rock parameters. Multi-dimensional and multi-variable heterogeneous models representing the same statistical properties were constructed and applied to the heterogeneity analysis of gas hydrate reservoirs. The petrophysical models provide the platform to estimate rock physics properties as well as to study the impact of seismic scattering, wave mode conversion, and their integration on wave behavior in heterogeneous reservoirs. Using the Biot-Gassmann theory, the statistical parameters obtained from Mallik 5L-38, and the correlation length estimated from acoustic impedance inversion, gas hydrate volume fraction in Mallik area was estimated to be 1.8%, approximately 2x108 m3 natural gas stored in a hydrate bearing interval within 0.25 km2 lateral extension and between 889 m and 1115 m depth. With parallel 3-D viscoelastic Finite Difference (FD) software, we conducted a 3D numerical experiment of near offset Vertical Seismic Profile. The synthetic results implied that the strong attenuation observed in the field data might be caused by the scattering.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation
NASA Astrophysics Data System (ADS)
Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.
2011-03-01
A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
NASA Technical Reports Server (NTRS)
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Computer Laboratory for Multi-scale Simulations of Novel Nanomaterials
2014-09-15
schemes for multiscale modeling of polymers. Permselective ion-exchange membranes for protective clothing, fuel cells , and batteries are of special...polyelectrolyte membranes ( PEM ) with chemical warfare agents (CWA) and their simulants and (2) development of new simulation methods and computational...chemical potential using gauge cell method and calculation of density profiles. However, the code does not run in parallel environments. For mesoscale
NASA Astrophysics Data System (ADS)
Luo, Aiwen; An, Fengwei; Zhang, Xiangyu; Chen, Lei; Huang, Zunkai; Jürgen Mattausch, Hans
2018-04-01
Feature extraction techniques are a cornerstone of object detection in computer-vision-based applications. The detection performance of vison-based detection systems is often degraded by, e.g., changes in the illumination intensity of the light source, foreground-background contrast variations or automatic gain control from the camera. In order to avoid such degradation effects, we present a block-based L1-norm-circuit architecture which is configurable for different image-cell sizes, cell-based feature descriptors and image resolutions according to customization parameters from the circuit input. The incorporated flexibility in both the image resolution and the cell size for multi-scale image pyramids leads to lower computational complexity and power consumption. Additionally, an object-detection prototype for performance evaluation in 65 nm CMOS implements the proposed L1-norm circuit together with a histogram of oriented gradients (HOG) descriptor and a support vector machine (SVM) classifier. The proposed parallel architecture with high hardware efficiency enables real-time processing, high detection robustness, small chip-core area as well as low power consumption for multi-scale object detection.
Sibole, Scott C.; Erdemir, Ahmet
2012-01-01
Cells of the musculoskeletal system are known to respond to mechanical loading and chondrocytes within the cartilage are not an exception. However, understanding how joint level loads relate to cell level deformations, e.g. in the cartilage, is not a straightforward task. In this study, a multi-scale analysis pipeline was implemented to post-process the results of a macro-scale finite element (FE) tibiofemoral joint model to provide joint mechanics based displacement boundary conditions to micro-scale cellular FE models of the cartilage, for the purpose of characterizing chondrocyte deformations in relation to tibiofemoral joint loading. It was possible to identify the load distribution within the knee among its tissue structures and ultimately within the cartilage among its extracellular matrix, pericellular environment and resident chondrocytes. Various cellular deformation metrics (aspect ratio change, volumetric strain, cellular effective strain and maximum shear strain) were calculated. To illustrate further utility of this multi-scale modeling pipeline, two micro-scale cartilage constructs were considered: an idealized single cell at the centroid of a 100×100×100 μm block commonly used in past research studies, and an anatomically based (11 cell model of the same volume) representation of the middle zone of tibiofemoral cartilage. In both cases, chondrocytes experienced amplified deformations compared to those at the macro-scale, predicted by simulating one body weight compressive loading on the tibiofemoral joint. In the 11 cell case, all cells experienced less deformation than the single cell case, and also exhibited a larger variance in deformation compared to other cells residing in the same block. The coupling method proved to be highly scalable due to micro-scale model independence that allowed for exploitation of distributed memory computing architecture. The method’s generalized nature also allows for substitution of any macro-scale and/or micro-scale model providing application for other multi-scale continuum mechanics problems. PMID:22649535
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Wave Turning and Flow Angle in the E-Region Ionosphere
NASA Astrophysics Data System (ADS)
Young, M.; Oppenheim, M. M.; Dimant, Y. S.
2016-12-01
This work presents results of particle-in-cell (PIC) simulations of Farley-Buneman (FB) turbulence at various altitudes in the high-latitude E-region ionosphere. In that region, the FB instability regularly produces meter-scale plasma irregularities. VHF radars observe coherent echoes via Bragg scatter from wave fronts parallel or anti-parallel to the radar line of sight (LoS) but do not necessarily measure the mean direction of wave propagation. Haldoupis (1984) conducted a study of diffuse radar aurora and found that the spectral width of back-scattered power depends critically on the angle between the radar LoS and the true flow direction, called the flow angle. Knowledge of the flow angle will allow researchers to better interpret observations of coherent back-scatter. Experiments designed to observe meter-scale irregularities in the E-region ionosphere created by the FB instability typically assume that the predominant flow direction is the E×B direction. However, linear theory of Dimant and Oppenheim (2004) showed that FB waves should turn away from E×B and particle-in-cell simulations by Oppenheim and Dimant (2013) support the theory. The present study comprises a quantitative analysis of the dependence of back-scattered power, flow velocity, and spectral width as functions of the flow angle. It also demonstrates that the mean direction of meter-scale wave propagation may differ from the E×B direction by tens of degrees. The analysis includes 2-D and 3-D simulations at a range of altitudes in the auroral ionosphere. Comparison between 2-D and 3-D simulations illustrates the relative importance to the irregularity spectrum of a small but finite component in the direction parallel to B. Previous work has shown this small parallel component to be important to turbulent electron heating and nonlinear transport.
Mapping behavioral landscapes for animal movement: a finite mixture modeling approach
Tracey, Jeff A.; Zhu, Jun; Boydston, Erin E.; Lyren, Lisa M.; Fisher, Robert N.; Crooks, Kevin R.
2013-01-01
Because of its role in many ecological processes, movement of animals in response to landscape features is an important subject in ecology and conservation biology. In this paper, we develop models of animal movement in relation to objects or fields in a landscape. We take a finite mixture modeling approach in which the component densities are conceptually related to different choices for movement in response to a landscape feature, and the mixing proportions are related to the probability of selecting each response as a function of one or more covariates. We combine particle swarm optimization and an Expectation-Maximization (EM) algorithm to obtain maximum likelihood estimates of the model parameters. We use this approach to analyze data for movement of three bobcats in relation to urban areas in southern California, USA. A behavioral interpretation of the models revealed similarities and differences in bobcat movement response to urbanization. All three bobcats avoided urbanization by moving either parallel to urban boundaries or toward less urban areas as the proportion of urban land cover in the surrounding area increased. However, one bobcat, a male with a dispersal-like large-scale movement pattern, avoided urbanization at lower densities and responded strictly by moving parallel to the urban edge. The other two bobcats, which were both residents and occupied similar geographic areas, avoided urban areas using a combination of movements parallel to the urban edge and movement toward areas of less urbanization. However, the resident female appeared to exhibit greater repulsion at lower levels of urbanization than the resident male, consistent with empirical observations of bobcats in southern California. Using the parameterized finite mixture models, we mapped behavioral states to geographic space, creating a representation of a behavioral landscape. This approach can provide guidance for conservation planning based on analysis of animal movement data using statistical models, thereby linking connectivity evaluations to empirical data.
NASA Astrophysics Data System (ADS)
Quan, Zhe; Wu, Lei
2017-09-01
This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained.
NASA Astrophysics Data System (ADS)
Shcherbakov, Alexandre S.; Chavez Dagostino, Miguel; Arellanes, Adan Omar; Tepichin Rodriguez, Eduardo
2017-08-01
We describe a potential prototype of modern spectrometer based on acousto-optical technique with three parallel optical arms for analysis of radio-wave signals specific to astronomical observations. Each optical arm exhibits original performances to provide parallel multi-band observations with different scales simultaneously. Similar multi-band instrument is able to realize measurements within various scenarios from planetary atmospheres to attractive objects in the distant Universe. The arrangement under development has two novelties. First, each optical arm represents an individual spectrum analyzer with its individual performances. Such an approach is conditioned by exploiting various materials for acousto-optical cells operating within various regimes, frequency ranges, and light wavelengths from independent light sources. Individually produced beam shapers give both the needed incident light polarization and the required apodization for light beam to increase the dynamic range of the system as a whole. After parallel acousto-optical processing, a few data flows from these optical arms are united by the joint CCD matrix on the stage of the combined extremely high-bit rate electronic data processing that provides the system performances as well. The other novelty consists in the usage of various materials for designing wide-aperture acousto-optical cells exhibiting the best performances within each of optical arms. Here, one can mention specifically selected cuts of tellurium dioxide, bastron, and lithium niobate, which overlap selected areas within the frequency range from 40 MHz to 2.0 GHz. Thus one yields the united versatile instrument for comprehensive studies of astronomical objects simultaneously with precise synchronization in various frequency ranges.
student, he developed a parallel spectral finite element method for treating the interaction of large mechanics of fluids, structures, and their interaction|Spectral finite-element methods for time-dependent
Multi-scale Material Parameter Identification Using LS-DYNA® and LS-OPT®
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stander, Nielen; Basudhar, Anirban; Basu, Ushnish
2015-06-15
Ever-tightening regulations on fuel economy and carbon emissions demand continual innovation in finding ways for reducing vehicle mass. Classical methods for computational mass reduction include sizing, shape and topology optimization. One of the few remaining options for weight reduction can be found in materials engineering and material design optimization. Apart from considering different types of materials by adding material diversity, an appealing option in automotive design is to engineer steel alloys for the purpose of reducing thickness while retaining sufficient strength and ductility required for durability and safety. Such a project was proposed and is currently being executed under themore » auspices of the United States Automotive Materials Partnership (USAMP) funded by the Department of Energy. Under this program, new steel alloys (Third Generation Advanced High Strength Steel or 3GAHSS) are being designed, tested and integrated with the remaining design variables of a benchmark vehicle Finite Element model. In this project the principal phases identified are (i) material identification, (ii) formability optimization and (iii) multi-disciplinary vehicle optimization. This paper serves as an introduction to the LS-OPT methodology and therefore mainly focuses on the first phase, namely an approach to integrate material identification using material models of different length scales. For this purpose, a multi-scale material identification strategy, consisting of a Crystal Plasticity (CP) material model and a Homogenized State Variable (SV) model, is discussed and demonstrated. The paper concludes with proposals for integrating the multi-scale methodology into the overall vehicle design.« less
Airbreathing Propulsion System Analysis Using Multithreaded Parallel Processing
NASA Technical Reports Server (NTRS)
Schunk, Richard Gregory; Chung, T. J.; Rodriguez, Pete (Technical Monitor)
2000-01-01
In this paper, parallel processing is used to analyze the mixing, and combustion behavior of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed [Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain using multithreaded domain decomposition parallel processing based on the flowfield-dependent variation theory. Numerical simulations of chemically reacting flows are difficult because of the strong interactions between the turbulent hydrodynamic and chemical processes. The algorithm must provide an accurate representation of the flowfield, since unphysical flowfield calculations will lead to the faulty loss or creation of species mass fraction, or even premature ignition, which in turn alters the flowfield information. Another difficulty arises from the disparity in time scales between the flowfield and chemical reactions, which may require the use of finite rate chemistry. The situations are more complex when there is a disparity in length scales involved in turbulence. In order to cope with these complicated physical phenomena, it is our plan to utilize the flowfield-dependent variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the proposed computation requires the most sophisticated computational strategies. The multithreaded domain decomposition parallel processing will be necessary in order to reduce both computational time and storage. Without special treatments involved in computer engineering, our attempt to analyze the airbreathing combustion appears to be difficult, if not impossible.
Understanding hydraulic fracturing: a multi-scale problem.
Hyman, J D; Jiménez-Martínez, J; Viswanathan, H S; Carey, J W; Porter, M L; Rougier, E; Karra, S; Kang, Q; Frash, L; Chen, L; Lei, Z; O'Malley, D; Makedonska, N
2016-10-13
Despite the impact that hydraulic fracturing has had on the energy sector, the physical mechanisms that control its efficiency and environmental impacts remain poorly understood in part because the length scales involved range from nanometres to kilometres. We characterize flow and transport in shale formations across and between these scales using integrated computational, theoretical and experimental efforts/methods. At the field scale, we use discrete fracture network modelling to simulate production of a hydraulically fractured well from a fracture network that is based on the site characterization of a shale gas reservoir. At the core scale, we use triaxial fracture experiments and a finite-discrete element model to study dynamic fracture/crack propagation in low permeability shale. We use lattice Boltzmann pore-scale simulations and microfluidic experiments in both synthetic and shale rock micromodels to study pore-scale flow and transport phenomena, including multi-phase flow and fluids mixing. A mechanistic description and integration of these multiple scales is required for accurate predictions of production and the eventual optimization of hydrocarbon extraction from unconventional reservoirs. Finally, we discuss the potential of CO2 as an alternative working fluid, both in fracturing and re-stimulating activities, beyond its environmental advantages.This article is part of the themed issue 'Energy and the subsurface'. © 2016 The Author(s).
Understanding hydraulic fracturing: a multi-scale problem
Hyman, J. D.; Jiménez-Martínez, J.; Viswanathan, H. S.; Carey, J. W.; Porter, M. L.; Rougier, E.; Karra, S.; Kang, Q.; Frash, L.; Chen, L.; Lei, Z.; O’Malley, D.; Makedonska, N.
2016-01-01
Despite the impact that hydraulic fracturing has had on the energy sector, the physical mechanisms that control its efficiency and environmental impacts remain poorly understood in part because the length scales involved range from nanometres to kilometres. We characterize flow and transport in shale formations across and between these scales using integrated computational, theoretical and experimental efforts/methods. At the field scale, we use discrete fracture network modelling to simulate production of a hydraulically fractured well from a fracture network that is based on the site characterization of a shale gas reservoir. At the core scale, we use triaxial fracture experiments and a finite-discrete element model to study dynamic fracture/crack propagation in low permeability shale. We use lattice Boltzmann pore-scale simulations and microfluidic experiments in both synthetic and shale rock micromodels to study pore-scale flow and transport phenomena, including multi-phase flow and fluids mixing. A mechanistic description and integration of these multiple scales is required for accurate predictions of production and the eventual optimization of hydrocarbon extraction from unconventional reservoirs. Finally, we discuss the potential of CO2 as an alternative working fluid, both in fracturing and re-stimulating activities, beyond its environmental advantages. This article is part of the themed issue ‘Energy and the subsurface’. PMID:27597789
Preparing CAM-SE for Multi-Tracer Applications: CAM-SE-Cslam
NASA Astrophysics Data System (ADS)
Lauritzen, P. H.; Taylor, M.; Goldhaber, S.
2014-12-01
The NCAR-DOE spectral element (SE) dynamical core comes from the HOMME (High-Order Modeling Environment; Dennis et al., 2012) and it is available in CAM. The CAM-SE dynamical core is designed with intrinsic mimetic properties guaranteeing total energy conservation (to time-truncation errors) and mass-conservation, and has demonstrated excellent scalability on massively parallel compute platforms (Taylor, 2011). For applications involving many tracers such as chemistry and biochemistry modeling, CAM-SE has been found to be significantly more computationally costly than the current "workhorse" model CAM-FV (Finite-Volume; Lin 2004). Hence a multi-tracer efficient scheme, called the CSLAM (Conservative Semi-Lagrangian Multi-tracer; Lauritzen et al., 2011) scheme, has been implemented in the HOMME (Erath et al., 2012). The CSLAM scheme has recently been cast in flux-form in HOMME so that it can be coupled to the SE dynamical core through conventional flux-coupling methods where the SE dynamical core provides background air mass fluxes to CSLAM. Since the CSLAM scheme makes use of a finite-volume gnomonic cubed-sphere grid and hence does not operate on the SE quadrature grid, the capability of running tracer advection, the physical parameterization suite and dynamics on separate grids has been implemented in CAM-SE. The default CAM-SE-CSLAM setup is to run physics on the quasi-equal area CSLAM grid. The capability of running physics on a different grid than the SE dynamical core may provide a more consistent coupling since the physics grid option operates with quasi-equal-area cell average values rather than non-equi-distant grid-point (SE quadrature point) values. Preliminary results on the performance of CAM-SE-CSLAM will be presented.
Nonlinear Conservation Laws and Finite Volume Methods
NASA Astrophysics Data System (ADS)
Leveque, Randall J.
Introduction Software Notation Classification of Differential Equations Derivation of Conservation Laws The Euler Equations of Gas Dynamics Dissipative Fluxes Source Terms Radiative Transfer and Isothermal Equations Multi-dimensional Conservation Laws The Shock Tube Problem Mathematical Theory of Hyperbolic Systems Scalar Equations Linear Hyperbolic Systems Nonlinear Systems The Riemann Problem for the Euler Equations Numerical Methods in One Dimension Finite Difference Theory Finite Volume Methods Importance of Conservation Form - Incorrect Shock Speeds Numerical Flux Functions Godunov's Method Approximate Riemann Solvers High-Resolution Methods Other Approaches Boundary Conditions Source Terms and Fractional Steps Unsplit Methods Fractional Step Methods General Formulation of Fractional Step Methods Stiff Source Terms Quasi-stationary Flow and Gravity Multi-dimensional Problems Dimensional Splitting Multi-dimensional Finite Volume Methods Grids and Adaptive Refinement Computational Difficulties Low-Density Flows Discrete Shocks and Viscous Profiles Start-Up Errors Wall Heating Slow-Moving Shocks Grid Orientation Effects Grid-Aligned Shocks Magnetohydrodynamics The MHD Equations One-Dimensional MHD Solving the Riemann Problem Nonstrict Hyperbolicity Stiffness The Divergence of B Riemann Problems in Multi-dimensional MHD Staggered Grids The 8-Wave Riemann Solver Relativistic Hydrodynamics Conservation Laws in Spacetime The Continuity Equation The 4-Momentum of a Particle The Stress-Energy Tensor Finite Volume Methods Multi-dimensional Relativistic Flow Gravitation and General Relativity References
Research on Multi - Person Parallel Modeling Method Based on Integrated Model Persistent Storage
NASA Astrophysics Data System (ADS)
Qu, MingCheng; Wu, XiangHu; Tao, YongChao; Liu, Ying
2018-03-01
This paper mainly studies the multi-person parallel modeling method based on the integrated model persistence storage. The integrated model refers to a set of MDDT modeling graphics system, which can carry out multi-angle, multi-level and multi-stage description of aerospace general embedded software. Persistent storage refers to converting the data model in memory into a storage model and converting the storage model into a data model in memory, where the data model refers to the object model and the storage model is a binary stream. And multi-person parallel modeling refers to the need for multi-person collaboration, the role of separation, and even real-time remote synchronization modeling.
NASA Astrophysics Data System (ADS)
Kiyani, Khurom; Chapman, Sandra; Osman, Kareem; Sahraoui, Fouad; Hnat, Bogdan
2014-05-01
The anisotropic nature of the scaling properties of solar wind magnetic turbulence fluctuations is investigated scale by scale using high cadence in situ magnetic field measurements from the Cluster, ACE and STEREO spacecraft missions in both fast and slow quiet solar wind conditions. The data span five decades in scales from the inertial range to the electron Larmor radius. We find a clear transition in scaling behaviour between the inertial and kinetic range of scales, which provides a direct, quantitative constraint on the physical processes that mediate the cascade of energy through these scales. In the inertial (magnetohydrodynamic) range the statistical nature of turbulent fluctuations are known to be anisotropic, both in the vector components of the magnetic field fluctuations (variance anisotropy) and in the spatial scales of these fluctuations (wavevector or k-anisotropy). We show for the first time that, when measuring parallel to the local magnetic field direction, the full statistical signature of the magnetic and Elsasser field fluctuations is that of a non-Gaussian globally scale-invariant process. This is distinct from the classic multi-exponent statistics observed when the local magnetic field is perpendicular to the flow direction. These observations suggest the weakness, or absence, of a parallel magnetofluid turbulence energy cascade. In contrast to the inertial range, there is a successive increase toward isotropy between parallel and transverse power at scales below the ion Larmor radius, with isotropy being achieved at the electron Larmor radius. Computing higher-order statistics, we show that the full statistical signature of both parallel, and perpendicular fluctuations at scales below the ion Larmor radius are that of an isotropic globally scale-invariant non-Gaussian process. Lastly, we perform a survey of multiple intervals of quiet solar wind sampled under different plasma conditions (fast, slow wind; plasma beta etc.) and find that the above results on the scaling transition between inertial and kinetic range scales are qualitatively robust, and that quantitatively, there is a spread in the values of the scaling exponents.
Analysis of turbulent free jet hydrogen-air diffusion flames with finite chemical reaction rates
NASA Technical Reports Server (NTRS)
Sislian, J. P.
1978-01-01
The nonequilibrium flow field resulting from the turbulent mixing and combustion of a supersonic axisymmetric hydrogen jet in a supersonic parallel coflowing air stream is analyzed. Effective turbulent transport properties are determined using the (K-epsilon) model. The finite-rate chemistry model considers eight reactions between six chemical species, H, O, H2O, OH, O2, and H2. The governing set of nonlinear partial differential equations is solved by an implicit finite-difference procedure. Radial distributions are obtained at two downstream locations of variables such as turbulent kinetic energy, turbulent dissipation rate, turbulent scale length, and viscosity. The results show that these variables attain peak values at the axis of symmetry. Computed distributions of velocity, temperature, and mass fraction are also given. A direct analytical approach to account for the effect of species concentration fluctuations on the mean production rate of species (the phenomenon of unmixedness) is also presented. However, the use of the method does not seem justified in view of the excessive computer time required to solve the resulting system of equations.
NASA Astrophysics Data System (ADS)
Shen, Wei; Li, Dongsheng; Zhang, Shuaifang; Ou, Jinping
2017-07-01
This paper presents a hybrid method that combines the B-spline wavelet on the interval (BSWI) finite element method and spectral analysis based on fast Fourier transform (FFT) to study wave propagation in One-Dimensional (1D) structures. BSWI scaling functions are utilized to approximate the theoretical wave solution in the spatial domain and construct a high-accuracy dynamic stiffness matrix. Dynamic reduction on element level is applied to eliminate the interior degrees of freedom of BSWI elements and substantially reduce the size of the system matrix. The dynamic equations of the system are then transformed and solved in the frequency domain through FFT-based spectral analysis which is especially suitable for parallel computation. A comparative analysis of four different finite element methods is conducted to demonstrate the validity and efficiency of the proposed method when utilized in high-frequency wave problems. Other numerical examples are utilized to simulate the influence of crack and delamination on wave propagation in 1D rods and beams. Finally, the errors caused by FFT and their corresponding solutions are presented.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
The PIPER project set out to develop methodologies and software for measurement, analysis, attribution, and presentation of performance data for extreme-scale systems. Goals of the project were to support analysis of massive multi-scale parallelism, heterogeneous architectures, multi-faceted performance concerns, and to support both post-mortem performance analysis to identify program features that contribute to problematic performance and on-line performance analysis to drive adaptation. This final report summarizes the research and development activity at Rice University as part of the PIPER project. Producing a complete suite of performance tools for exascale platforms during the course of this project was impossible since bothmore » hardware and software for exascale systems is still a moving target. For that reason, the project focused broadly on the development of new techniques for measurement and analysis of performance on modern parallel architectures, enhancements to HPCToolkit’s software infrastructure to support our research goals or use on sophisticated applications, engaging developers of multithreaded runtimes to explore how support for tools should be integrated into their designs, engaging operating system developers with feature requests for enhanced monitoring support, engaging vendors with requests that they add hardware measure- ment capabilities and software interfaces needed by tools as they design new components of HPC platforms including processors, accelerators and networks, and finally collaborations with partners interested in using HPCToolkit to analyze and tune scalable parallel applications.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.; ...
2015-07-30
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
Towards a large-scale scalable adaptive heart model using shallow tree meshes
NASA Astrophysics Data System (ADS)
Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf
2015-10-01
Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
NASA Technical Reports Server (NTRS)
Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;
2001-01-01
A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.
Spectral enstrophy budget in a shear-less flow with turbulent/non-turbulent interface
NASA Astrophysics Data System (ADS)
Cimarelli, Andrea; Cocconi, Giacomo; Frohnapfel, Bettina; De Angelis, Elisabetta
2015-12-01
A numerical analysis of the interaction between decaying shear free turbulence and quiescent fluid is performed by means of global statistical budgets of enstrophy, both, at the single-point and two point levels. The single-point enstrophy budget allows us to recognize three physically relevant layers: a bulk turbulent region, an inhomogeneous turbulent layer, and an interfacial layer. Within these layers, enstrophy is produced, transferred, and finally destroyed while leading to a propagation of the turbulent front. These processes do not only depend on the position in the flow field but are also strongly scale dependent. In order to tackle this multi-dimensional behaviour of enstrophy in the space of scales and in physical space, we analyse the spectral enstrophy budget equation. The picture consists of an inviscid spatial cascade of enstrophy from large to small scales parallel to the interface moving towards the interface. At the interface, this phenomenon breaks, leaving place to an anisotropic cascade where large scale structures exhibit only a cascade process normal to the interface thus reducing their thickness while retaining their lengths parallel to the interface. The observed behaviour could be relevant for both the theoretical and the modelling approaches to flow with interacting turbulent/nonturbulent regions. The scale properties of the turbulent propagation mechanisms highlight that the inviscid turbulent transport is a large-scale phenomenon. On the contrary, the viscous diffusion, commonly associated with small scale mechanisms, highlights a much richer physics involving small lengths, normal to the interface, but at the same time large scales, parallel to the interface.
NASA Astrophysics Data System (ADS)
Lu, Hua; Yue, Zengqi; Zhao, Jianlin
2018-05-01
We propose and investigate a new kind of bandpass filters based on the plasmonically induced transparency (PIT) effect in a special metal-insulator-metal (MIM) waveguide system. The finite element method (FEM) simulations illustrate that the obvious PIT response can be generated in the metallic nanostructure with the stub and coupled cavities. The lineshape and position of the PIT peak are particularly dependent on the lengths of the stub and coupled cavities, the waveguide width, as well as the coupling distance between the stub and coupled cavities. The numerical simulations are in accordance with the results obtained by the temporal coupled-mode theory. The multi-peak PIT effect can be achieved by integrating multiple coupled cavities into the plasmonic waveguide. This PIT response contributes to the flexible realization of chip-scale multi-channel bandpass filters, which could find crucial applications in highly integrated optical circuits for signal processing.
Scaling of membrane-type locally resonant acoustic metamaterial arrays.
Naify, Christina J; Chang, Chia-Ming; McKnight, Geoffrey; Nutt, Steven R
2012-10-01
Metamaterials have emerged as promising solutions for manipulation of sound waves in a variety of applications. Locally resonant acoustic materials (LRAM) decrease sound transmission by 500% over acoustic mass law predictions at peak transmission loss (TL) frequencies with minimal added mass, making them appealing for weight-critical applications such as aerospace structures. In this study, potential issues associated with scale-up of the structure are addressed. TL of single-celled and multi-celled LRAM was measured using an impedance tube setup with systematic variation in geometric parameters to understand the effects of each parameter on acoustic response. Finite element analysis was performed to predict TL as a function of frequency for structures with varying complexity, including stacked structures and multi-celled arrays. Dynamic response of the array structures under discrete frequency excitation was investigated using laser vibrometry to verify negative dynamic mass behavior.
NASA Technical Reports Server (NTRS)
Mohr, Karen Irene; Tao, Wei-Kuo; Chern, Jiun-Dar; Kumar, Sujay V.; Peters-Lidard, Christa D.
2013-01-01
The present generation of general circulation models (GCM) use parameterized cumulus schemes and run at hydrostatic grid resolutions. To improve the representation of cloud-scale moist processes and landeatmosphere interactions, a global, Multi-scale Modeling Framework (MMF) coupled to the Land Information System (LIS) has been developed at NASA-Goddard Space Flight Center. The MMFeLIS has three components, a finite-volume (fv) GCM (Goddard Earth Observing System Ver. 4, GEOS-4), a 2D cloud-resolving model (Goddard Cumulus Ensemble, GCE), and the LIS, representing the large-scale atmospheric circulation, cloud processes, and land surface processes, respectively. The non-hydrostatic GCE model replaces the single-column cumulus parameterization of fvGCM. The model grid is composed of an array of fvGCM gridcells each with a series of embedded GCE models. A horizontal coupling strategy, GCE4fvGCM4Coupler4LIS, offered significant computational efficiency, with the scalability and I/O capabilities of LIS permitting landeatmosphere interactions at cloud-scale. Global simulations of 2007e2008 and comparisons to observations and reanalysis products were conducted. Using two different versions of the same land surface model but the same initial conditions, divergence in regional, synoptic-scale surface pressure patterns emerged within two weeks. The sensitivity of largescale circulations to land surface model physics revealed significant functional value to using a scalable, multi-model land surface modeling system in global weather and climate prediction.
Finite Volume Numerical Methods for Aeroheating Rate Calculations from Infrared Thermographic Data
NASA Technical Reports Server (NTRS)
Daryabeigi, Kamran; Berry, Scott A.; Horvath, Thomas J.; Nowak, Robert J.
2003-01-01
The use of multi-dimensional finite volume numerical techniques with finite thickness models for calculating aeroheating rates from measured global surface temperatures on hypersonic wind tunnel models was investigated. Both direct and inverse finite volume techniques were investigated and compared with the one-dimensional semi -infinite technique. Global transient surface temperatures were measured using an infrared thermographic technique on a 0.333-scale model of the Hyper-X forebody in the Langley Research Center 20-Inch Mach 6 Air tunnel. In these tests the effectiveness of vortices generated via gas injection for initiating hypersonic transition on the Hyper-X forebody were investigated. An array of streamwise orientated heating striations were generated and visualized downstream of the gas injection sites. In regions without significant spatial temperature gradients, one-dimensional techniques provided accurate aeroheating rates. In regions with sharp temperature gradients due to the striation patterns two-dimensional heat transfer techniques were necessary to obtain accurate heating rates. The use of the one-dimensional technique resulted in differences of 20% in the calculated heating rates because it did not account for lateral heat conduction in the model.
Multi-scale heterogeneity of the 2011 Great Tohoku-oki Earthquake from dynamic simulations
NASA Astrophysics Data System (ADS)
Aochi, H.; Ide, S.
2011-12-01
In order to explain the scaling issues of earthquakes of different sizes, multi-scale heterogeneity conception is necessary to characterize earthquake faulting property (Ide and Aochi, JGR, 2005; Aochi and Ide, JGR, 2009).The 2011 Great Tohoku-oki earthquake (M9) is characterized by a slow initial phase of about M7, a M8 class deep rupture, and a M9 main rupture with quite large slip near the trench (e.g. Ide et al., Science, 2011) as well as the presence of foreshocks. We dynamically model these features based on the multi-scale conception. We suppose a significantly large fracture energy (corresponding to slip-weakening distance of 3.2 m) in most of the fault dimension to represent the M9 rupture. However we give local heterogeneity with relatively small circular patches of smaller fracture energy, by assuming the linear scaling relation between the radius and fracture energy. The calculation is carried out using 3D Boundary Integral Equation Method. We first begin only with the mainshock (Aochi and Ide, EPS, 2011), but later we find it important to take into account of a series of foreshocks since the 9th March (M7.4). The smaller patches including the foreshock area are necessary to launch the M9 rupture area of large fracture energy. We then simulate the ground motion in low frequencies using Finite Difference Method. Qualitatively, the observed tendency is consistent with our simulations, in the meaning of the transition from the central part to the southern part in low frequencies (10 - 20 sec). At higher frequencies (1-10 sec), further small asperities are inferred in the observed signals, and this feature matches well with our multi-scale conception.
Space Technology 5 (ST-5) Observations of Field-Aligned Currents: Temporal Variability
NASA Technical Reports Server (NTRS)
Le, Guan
2010-01-01
Space Technology 5 (ST-5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from STS. The data demonstrate that masoscale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of about 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are about I min for meso-scale currents and about 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
NASA Technical Reports Server (NTRS)
Le, Guan
2010-01-01
Space Technology 5 (ST-5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that mesoscale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of about 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are about 1 min for meso-scale currents and about 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
NASA Astrophysics Data System (ADS)
Recent advances in computational fluid dynamics are discussed in reviews and reports. Topics addressed include large-scale LESs for turbulent pipe and channel flows, numerical solutions of the Euler and Navier-Stokes equations on parallel computers, multigrid methods for steady high-Reynolds-number flow past sudden expansions, finite-volume methods on unstructured grids, supersonic wake flow on a blunt body, a grid-characteristic method for multidimensional gas dynamics, and CIC numerical simulation of a wave boundary layer. Consideration is given to vortex simulations of confined two-dimensional jets, supersonic viscous shear layers, spectral methods for compressible flows, shock-wave refraction at air/water interfaces, oscillatory flow in a two-dimensional collapsible channel, the growth of randomness in a spatially developing wake, and an efficient simplex algorithm for the finite-difference and dynamic linear-programming method in optimal potential control.
Coupled multi-disciplinary simulation of composite engine structures in propulsion environment
NASA Technical Reports Server (NTRS)
Chamis, Christos C.; Singhal, Surendra N.
1992-01-01
A computational simulation procedure is described for the coupled response of multi-layered multi-material composite engine structural components which are subjected to simultaneous multi-disciplinary thermal, structural, vibration, and acoustic loadings including the effect of hostile environments. The simulation is based on a three dimensional finite element analysis technique in conjunction with structural mechanics codes and with acoustic analysis methods. The composite material behavior is assessed at the various composite scales, i.e., the laminate/ply/constituents (fiber/matrix), via a nonlinear material characterization model. Sample cases exhibiting nonlinear geometrical, material, loading, and environmental behavior of aircraft engine fan blades, are presented. Results for deformed shape, vibration frequency, mode shapes, and acoustic noise emitted from the fan blade, are discussed for their coupled effect in hot and humid environments. Results such as acoustic noise for coupled composite-mechanics/heat transfer/structural/vibration/acoustic analyses demonstrate the effectiveness of coupled multi-disciplinary computational simulation and the various advantages of composite materials compared to metals.
NASA Astrophysics Data System (ADS)
Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.
2017-11-01
Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.
NASA Workshop on Computational Structural Mechanics 1987, part 1
NASA Technical Reports Server (NTRS)
Sykes, Nancy P. (Editor)
1989-01-01
Topics in Computational Structural Mechanics (CSM) are reviewed. CSM parallel structural methods, a transputer finite element solver, architectures for multiprocessor computers, and parallel eigenvalue extraction are among the topics discussed.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Koblinsky, Chester (Technical Monitor)
2001-01-01
A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been implemented for the Poseidon ocean circulation model and tested with a Pacific Basin model configuration. There are about two million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF configuration are discussed. It is shown that the regionalization of the background covariances; has a negligible impact on the quality of the analyses. The parallel algorithm is very efficient for large numbers of observations but does not scale well beyond 100 PEs at the current model resolution. On a platform with distributed memory, memory rather than speed is the limiting factor.
Crustal origin of trench-parallel shear-wave fast polarizations in the Central Andes
NASA Astrophysics Data System (ADS)
Wölbern, I.; Löbl, U.; Rümpker, G.
2014-04-01
In this study, SKS and local S phases are analyzed to investigate variations of shear-wave splitting parameters along two dense seismic profiles across the central Andean Altiplano and Puna plateaus. In contrast to previous observations, the vast majority of the measurements reveal fast polarizations sub-parallel to the subduction direction of the Nazca plate with delay times between 0.3 and 1.2 s. Local phases show larger variations of fast polarizations and exhibit delay times ranging between 0.1 and 1.1 s. Two 70 km and 100 km wide sections along the Altiplano profile exhibit larger delay times and are characterized by fast polarizations oriented sub-parallel to major fault zones. Based on finite-difference wavefield calculations for anisotropic subduction zone models we demonstrate that the observations are best explained by fossil slab anisotropy with fast symmetry axes oriented sub-parallel to the slab movement in combination with a significant component of crustal anisotropy of nearly trench-parallel fast-axis orientation. From the modeling we exclude a sub-lithospheric origin of the observed strong anomalies due to the short-scale variations of the fast polarizations. Instead, our results indicate that anisotropy in the Central Andes generally reflects the direction of plate motion while the observed trench-parallel fast polarizations likely originate in the continental crust above the subducting slab.
GPU Multi-Scale Particle Tracking and Multi-Fluid Simulations of the Radiation Belts
NASA Astrophysics Data System (ADS)
Ziemba, T.; Carscadden, J.; O'Donnell, D.; Winglee, R.; Harnett, E.; Cash, M.
2007-12-01
The properties of the radiation belts can vary dramatically under the influence of magnetic storms and storm-time substorms. The task of understanding and predicting radiation belt properties is made difficult because their properties determined by global processes as well as small-scale wave-particle interactions. A full solution to the problem will require major innovations in technique and computer hardware. The proposed work will demonstrates liked particle tracking codes with new multi-scale/multi-fluid global simulations that provide the first means to include small-scale processes within the global magnetospheric context. A large hurdle to the problem is having sufficient computer hardware that is able to handle the dissipate temporal and spatial scale sizes. A major innovation of the work is that the codes are designed to run of graphics processing units (GPUs). GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for little more cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. A demonstration of the code pushing more than 500,000 particles faster than real time is presented, and used to provide new insight into radiation belt dynamics.
A parallel finite-difference method for computational aerodynamics
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
Efficient Implementation of Multigrid Solvers on Message-Passing Parrallel Systems
NASA Technical Reports Server (NTRS)
Lou, John
1994-01-01
We discuss our implementation strategies for finite difference multigrid partial differential equation (PDE) solvers on message-passing systems. Our target parallel architecture is Intel parallel computers: the Delta and Paragon system.
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
NASA Astrophysics Data System (ADS)
Chen, M.; Wei, S.
2016-12-01
The serious damage of Mexico City caused by the 1985 Michoacan earthquake 400 km away indicates that urban areas may be affected by remote earthquakes. To asses earthquake risk of urban areas imposed by distant earthquakes, we developed a hybrid Frequency Wavenumber (FK) and Finite Difference (FD) code implemented with MPI, since the computation of seismic wave propagation from a distant earthquake using a single numerical method (e.g. Finite Difference, Finite Element or Spectral Element) is very expensive. In our approach, we compute the incident wave field (ud) at the boundaries of the excitation box, which surrounding the local structure, using a paralleled FK method (Zhu and Rivera, 2002), and compute the total wave field (u) within the excitation box using a parallelled 2D FD method. We apply perfectly matched layer (PML) absorbing condition to the diffracted wave field (u-ud). Compared to previous Generalized Ray Theory and Finite Difference (Wen and Helmberger, 1998), Frequency Wavenumber and Spectral Element (Tong et al., 2014), and Direct Solution Method and Spectral Element hybrid method (Monteiller et al., 2013), our absorbing boundary condition dramatically suppress the numerical noise. The MPI implementation of our method can greatly speed up the calculation. Besides, our hybrid method also has a potential use in high resolution array imaging similar to Tong et al. (2014).
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Scalable non-negative matrix tri-factorization.
Čopar, Andrej; Žitnik, Marinka; Zupan, Blaž
2017-01-01
Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Our focus in this paper is matrix tri-factorization, a popular method that is not limited by the assumption of standard matrix factorization about data residing in one latent space. Matrix tri-factorization solves this by inferring a separate latent space for each dimension in a data matrix, and a latent mapping of interactions between the inferred spaces, making the approach particularly suitable for biomedical data mining. We developed a block-wise approach for latent factor learning in matrix tri-factorization. The approach partitions a data matrix into disjoint submatrices that are treated independently and fed into a parallel factorization system. An appealing property of the proposed approach is its mathematical equivalence with serial matrix tri-factorization. In a study on large biomedical datasets we show that our approach scales well on multi-processor and multi-GPU architectures. On a four-GPU system we demonstrate that our approach can be more than 100-times faster than its single-processor counterpart. A general approach for scaling non-negative matrix tri-factorization is proposed. The approach is especially useful parallel matrix factorization implemented in a multi-GPU environment. We expect the new approach will be useful in emerging procedures for latent factor analysis, notably for data integration, where many large data matrices need to be collectively factorized.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
2007-01-01
CONTRACT NUMBER Problems: Finite -Horizon and State-Feedback Cost-Cumulant Control Paradigm (PREPRINT) 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER...cooperative cost-cumulant control regime for the class of multi-person single-objective decision problems characterized by quadratic random costs and... finite -horizon integral quadratic cost associated with a linear stochastic system . Since this problem formation is parameterized by the number of cost
Simulation of all-scale atmospheric dynamics on unstructured meshes
NASA Astrophysics Data System (ADS)
Smolarkiewicz, Piotr K.; Szmelter, Joanna; Xiao, Feng
2016-10-01
The advance of massively parallel computing in the nineteen nineties and beyond encouraged finer grid intervals in numerical weather-prediction models. This has improved resolution of weather systems and enhanced the accuracy of forecasts, while setting the trend for development of unified all-scale atmospheric models. This paper first outlines the historical background to a wide range of numerical methods advanced in the process. Next, the trend is illustrated with a technical review of a versatile nonoscillatory forward-in-time finite-volume (NFTFV) approach, proven effective in simulations of atmospheric flows from small-scale dynamics to global circulations and climate. The outlined approach exploits the synergy of two specific ingredients: the MPDATA methods for the simulation of fluid flows based on the sign-preserving properties of upstream differencing; and the flexible finite-volume median-dual unstructured-mesh discretisation of the spatial differential operators comprising PDEs of atmospheric dynamics. The paper consolidates the concepts leading to a family of generalised nonhydrostatic NFTFV flow solvers that include soundproof PDEs of incompressible Boussinesq, anelastic and pseudo-incompressible systems, common in large-eddy simulation of small- and meso-scale dynamics, as well as all-scale compressible Euler equations. Such a framework naturally extends predictive skills of large-eddy simulation to the global atmosphere, providing a bottom-up alternative to the reverse approach pursued in the weather-prediction models. Theoretical considerations are substantiated by calculations attesting to the versatility and efficacy of the NFTFV approach. Some prospective developments are also discussed.
NASA Technical Reports Server (NTRS)
Tessler, Alexander; Gherlone, Marco; Versino, Daniele; DiSciuva, Marco
2012-01-01
This paper reviews the theoretical foundation and computational mechanics aspects of the recently developed shear-deformation theory, called the Refined Zigzag Theory (RZT). The theory is based on a multi-scale formalism in which an equivalent single-layer plate theory is refined with a robust set of zigzag local layer displacements that are free of the usual deficiencies found in common plate theories with zigzag kinematics. In the RZT, first-order shear-deformation plate theory is used as the equivalent single-layer plate theory, which represents the overall response characteristics. Local piecewise-linear zigzag displacements are used to provide corrections to these overall response characteristics that are associated with the plate heterogeneity and the relative stiffnesses of the layers. The theory does not rely on shear correction factors and is equally accurate for homogeneous, laminated composite, and sandwich beams and plates. Regardless of the number of material layers, the theory maintains only seven kinematic unknowns that describe the membrane, bending, and transverse shear plate-deformation modes. Derived from the virtual work principle, RZT is well-suited for developing computationally efficient, C(sup 0)-continuous finite elements; formulations of several RZT-based elements are highlighted. The theory and its finite element approximations thus provide a unified and reliable computational platform for the analysis and design of high-performance load-bearing aerospace structures.
NASA Technical Reports Server (NTRS)
Tessler, Alexander; Gherlone, Marco; Versino, Daniele; Di Sciuva, Marco
2012-01-01
This paper reviews the theoretical foundation and computational mechanics aspects of the recently developed shear-deformation theory, called the Refined Zigzag Theory (RZT). The theory is based on a multi-scale formalism in which an equivalent single-layer plate theory is refined with a robust set of zigzag local layer displacements that are free of the usual deficiencies found in common plate theories with zigzag kinematics. In the RZT, first-order shear-deformation plate theory is used as the equivalent single-layer plate theory, which represents the overall response characteristics. Local piecewise-linear zigzag displacements are used to provide corrections to these overall response characteristics that are associated with the plate heterogeneity and the relative stiffnesses of the layers. The theory does not rely on shear correction factors and is equally accurate for homogeneous, laminated composite, and sandwich beams and plates. Regardless of the number of material layers, the theory maintains only seven kinematic unknowns that describe the membrane, bending, and transverse shear plate-deformation modes. Derived from the virtual work principle, RZT is well-suited for developing computationally efficient, C0-continuous finite elements; formulations of several RZT-based elements are highlighted. The theory and its finite elements provide a unified and reliable computational platform for the analysis and design of high-performance load-bearing aerospace structures.
Binary tree eigen solver in finite element analysis
NASA Technical Reports Server (NTRS)
Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.
1993-01-01
This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Progress in fast, accurate multi-scale climate simulations
Collins, W. D.; Johansen, H.; Evans, K. J.; ...
2015-06-01
We present a survey of physical and computational techniques that have the potential to contribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth with these computational improvements include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enablingmore » improved accuracy and fidelity in simulation of dynamics and allowing more complete representations of climate features at the global scale. At the same time, partnerships with computer science teams have focused on taking advantage of evolving computer architectures such as many-core processors and GPUs. As a result, approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
NASA Astrophysics Data System (ADS)
Grujicic, M.; Bell, W. C.; Arakere, G.; He, T.; Xie, X.; Cheeseman, B. A.
2010-02-01
A meso-scale ballistic material model for a prototypical plain-woven single-ply flexible armor is developed and implemented in a material user subroutine for the use in commercial explicit finite element programs. The main intent of the model is to attain computational efficiency when calculating the mechanical response of the multi-ply fabric-based flexible-armor material during its impact with various projectiles without significantly sacrificing the key physical aspects of the fabric microstructure, architecture, and behavior. To validate the new model, a comparative finite element method analysis is carried out in which: (a) the plain-woven single-ply fabric is modeled using conventional shell elements and weaving is done in an explicit manner by snaking the yarns through the fabric and (b) the fabric is treated as a planar continuum surface composed of conventional shell elements to which the new meso-scale unit-cell based material model is assigned. The results obtained show that the material model provides a reasonably good description for the fabric deformation and fracture behavior under different combinations of fixed and free boundary conditions. Finally, the model is used in an investigation of the ability of a multi-ply soft-body armor vest to protect the wearer from impact by a 9-mm round nose projectile. The effects of inter-ply friction, projectile/yarn friction, and the far-field boundary conditions are revealed and the results explained using simple wave mechanics principles, high-deformation rate material behavior, and the role of various energy-absorbing mechanisms in the fabric-based armor systems.
Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation.
Nagaoka, Tomoaki; Watanabe, Soichi
2011-01-01
Numerical simulation with a numerical human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the numerical human model, we adapt three-dimensional FDTD code to a multi-GPU environment using Compute Unified Device Architecture (CUDA). In this study, we used NVIDIA Tesla C2070 as GPGPU boards. The performance of multi-GPU is evaluated in comparison with that of a single GPU and vector supercomputer. The calculation speed with four GPUs was approximately 3.5 times faster than with a single GPU, and was slightly (approx. 1.3 times) slower than with the supercomputer. Calculation speed of the three-dimensional FDTD method using GPUs can significantly improve with an expanding number of GPUs.
Parallel group independent component analysis for massive fMRI data sets.
Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S
2017-01-01
Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
FORCE2: A state-of-the-art two-phase code for hydrodynamic calculations
NASA Astrophysics Data System (ADS)
Ding, Jianmin; Lyczkowski, R. W.; Burge, S. W.
1993-02-01
A three-dimensional computer code for two-phase flow named FORCE2 has been developed by Babcock and Wilcox (B & W) in close collaboration with Argonne National Laboratory (ANL). FORCE2 is capable of both transient as well as steady-state simulations. This Cartesian coordinates computer program is a finite control volume, industrial grade and quality embodiment of the pilot-scale FLUFIX/MOD2 code and contains features such as three-dimensional blockages, volume and surface porosities to account for various obstructions in the flow field, and distributed resistance modeling to account for pressure drops caused by baffles, distributor plates and large tube banks. Recently computed results demonstrated the significance of and necessity for three-dimensional models of hydrodynamics and erosion. This paper describes the process whereby ANL's pilot-scale FLUFIX/MOD2 models and numerics were implemented into FORCE2. A description of the quality control to assess the accuracy of the new code and the validation using some of the measured data from Illinois Institute of Technology (UT) and the University of Illinois at Urbana-Champaign (UIUC) are given. It is envisioned that one day, FORCE2 with additional modules such as radiation heat transfer, combustion kinetics and multi-solids together with user-friendly pre- and post-processor software and tailored for massively parallel multiprocessor shared memory computational platforms will be used by industry and researchers to assist in reducing and/or eliminating the environmental and economic barriers which limit full consideration of coal, shale and biomass as energy sources, to retain energy security, and to remediate waste and ecological problems.
Electron Heating at Kinetic Scales in Magnetosheath Turbulence
NASA Technical Reports Server (NTRS)
Chasapis, Alexandros; Matthaeus, W. H.; Parashar, T. N.; Lecontel, O.; Retino, A.; Breuillard, H.; Khotyaintsev, Y.; Vaivads, A.; Lavraud, B.; Eriksson, E.;
2017-01-01
We present a statistical study of coherent structures at kinetic scales, using data from the Magnetospheric Multiscale mission in the Earths magnetosheath. We implemented the multi-spacecraft partial variance of increments (PVI) technique to detect these structures, which are associated with intermittency at kinetic scales. We examine the properties of the electron heating occurring within such structures. We find that, statistically, structures with a high PVI index are regions of significant electron heating. We also focus on one such structure, a current sheet, which shows some signatures consistent with magnetic reconnection. Strong parallel electron heating coincides with whistler emissions at the edges of the current sheet.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements
NASA Astrophysics Data System (ADS)
Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.
2000-11-01
In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
Guo, L-X; Li, J; Zeng, H
2009-11-01
We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Majda, Andrew J.; Xing, Yulong; Mohammadian, Majid
Determining the finite-amplitude preconditioned states in the hurricane embryo, which lead to tropical cyclogenesis, is a central issue in contemporary meteorology. In the embryo there is competition between different preconditioning mechanisms involving hydrodynamics and moist thermodynamics, which can lead to cyclogenesis. Here systematic asymptotic methods from applied mathematics are utilized to develop new simplified moist multi-scale models starting from the moist anelastic equations. Three interesting multi-scale models emerge in the analysis. The balanced mesoscale vortex (BMV) dynamics and the microscale balanced hot tower (BHT) dynamics involve simplified balanced equations without gravity waves for vertical vorticity amplification due to moist heatmore » sources and incorporate nonlinear advective fluxes across scales. The BMV model is the central one for tropical cyclogenesis in the embryo. The moist mesoscale wave (MMW) dynamics involves simplified equations for mesoscale moisture fluctuations, as well as linear hydrostatic waves driven by heat sources from moisture and eddy flux divergences. A simplified cloud physics model for deep convection is introduced here and used to study moist axisymmetric plumes in the BHT model. A simple application in periodic geometry involving the effects of mesoscale vertical shear and moist microscale hot towers on vortex amplification is developed here to illustrate features of the coupled multi-scale models. These results illustrate the use of these models in isolating key mechanisms in the embryo in a simplified content.« less
Decarboxylation of furfural on Pd(111): Ab initio molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Xue, Wenhua; Dang, Hongli; Shields, Darwin; Liu, Yingdi; Jentoft, Friederike; Resasco, Daniel; Wang, Sanwu
2013-03-01
Furfural conversion over metal catalysts plays an important role in the studies of biomass-derived feedstocks. We report ab initio molecular dynamics simulations for the decarboxylation process of furfural on the palladium surface at finite temperatures. We observed and analyzed the atomic-scale dynamics of furfural on the Pd(111) surface and the fluctuations of the bondlengths between the atoms in furfural. We found that the dominant bonding structure is the parallel structure in which the furfural plane, while slightly distorted, is parallel to the Pd surface. Analysis of the bondlength fluctuations indicates that the C-H bond is the aldehyde group of a furfural molecule is likely to be broken first, while the C =O bond has a tendency to be isolated as CO. Our results show that the reaction of decarbonylation dominates, consistent with the experimental measurements. Supported by DOE (DE-SC0004600). Simulations and calculations were performed on XSEDE's and NERSC's supercomputers.
State of the art in electromagnetic modeling for the Compact Linear Collider
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, Arno; Kabel, Andreas; Lee, Lie-Quan
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D electromagnetic time-domain code T3P for simulations of wakefields and transients in complex accelerator structures. T3P is based on state-of-the-art Finite Element methods on unstructured grids and features unconditional stability, quadratic surface approximation and up to 6th-order vector basis functions for unprecedented simulation accuracy. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with fast turn-around times, aiding the design of the next generation of accelerator facilities. Applications include simulations of the proposed two-beam accelerator structures for the Compact Linear Collider (CLIC) - wakefieldmore » damping in the Power Extraction and Transfer Structure (PETS) and power transfer to the main beam accelerating structures are investigated.« less
Parallel Solver for H(div) Problems Using Hybridization and AMG
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Chak S.; Vassilevski, Panayot S.
2016-01-15
In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examinedmore » through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.« less
NASA Technical Reports Server (NTRS)
Chew, W. C.; Song, J. M.; Lu, C. C.; Weedon, W. H.
1995-01-01
In the first phase of our work, we have concentrated on laying the foundation to develop fast algorithms, including the use of recursive structure like the recursive aggregate interaction matrix algorithm (RAIMA), the nested equivalence principle algorithm (NEPAL), the ray-propagation fast multipole algorithm (RPFMA), and the multi-level fast multipole algorithm (MLFMA). We have also investigated the use of curvilinear patches to build a basic method of moments code where these acceleration techniques can be used later. In the second phase, which is mainly reported on here, we have concentrated on implementing three-dimensional NEPAL on a massively parallel machine, the Connection Machine CM-5, and have been able to obtain some 3D scattering results. In order to understand the parallelization of codes on the Connection Machine, we have also studied the parallelization of 3D finite-difference time-domain (FDTD) code with PML material absorbing boundary condition (ABC). We found that simple algorithms like the FDTD with material ABC can be parallelized very well allowing us to solve within a minute a problem of over a million nodes. In addition, we have studied the use of the fast multipole method and the ray-propagation fast multipole algorithm to expedite matrix-vector multiplication in a conjugate-gradient solution to integral equations of scattering. We find that these methods are faster than LU decomposition for one incident angle, but are slower than LU decomposition when many incident angles are needed as in the monostatic RCS calculations.
Numerical simulation using vorticity-vector potential formulation
NASA Technical Reports Server (NTRS)
Tokunaga, Hiroshi
1993-01-01
An accurate and efficient computational method is needed for three-dimensional incompressible viscous flows in engineering applications. On solving the turbulent shear flows directly or using the subgrid scale model, it is indispensable to resolve the small scale fluid motions as well as the large scale motions. From this point of view, the pseudo-spectral method is used so far as the computational method. However, the finite difference or the finite element methods are widely applied for computing the flow with practical importance since these methods are easily applied to the flows with complex geometric configurations. However, there exist several problems in applying the finite difference method to direct and large eddy simulations. Accuracy is one of most important problems. This point was already addressed by the present author on the direct simulations on the instability of the plane Poiseuille flow and also on the transition to turbulence. In order to obtain high efficiency, the multi-grid Poisson solver is combined with the higher-order, accurate finite difference method. The formulation method is also one of the most important problems in applying the finite difference method to the incompressible turbulent flows. The three-dimensional Navier-Stokes equations have been solved so far in the primitive variables formulation. One of the major difficulties of this method is the rigorous satisfaction of the equation of continuity. In general, the staggered grid is used for the satisfaction of the solenoidal condition for the velocity field at the wall boundary. However, the velocity field satisfies the equation of continuity automatically in the vorticity-vector potential formulation. From this point of view, the vorticity-vector potential method was extended to the generalized coordinate system. In the present article, we adopt the vorticity-vector potential formulation, the generalized coordinate system, and the 4th-order accurate difference method as the computational method. We present the computational method and apply the present method to computations of flows in a square cavity at large Reynolds number in order to investigate its effectiveness.
The effect of thermomechanical processing on second phase particle redistribution in U-10 wt%Mo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Xiaohua; Wang, Xiaowo; Joshi, Vineet V.
2018-03-01
The multi-pass hot-rolling process of an annealed uranium-10 wt% molybdenum coupon was studied by plane-strain compression finite element modeling. Two point correlation function (2PCF) was used to analyze the carbide particle distribution after each rolling reduction. The hot rolling simulation results show that the alignment of UC particles along grain boundaries will rotate during rolling until it is parallel to the rolling direction, to form stringer-like distributions which are typically observed in rolled products that contain inclusions. 2PCF analysis of simulation shows that the interparticle spacing shrinks along the normal direction. The number of major peaks of 2PCF along NDmore » decreases after large reduction. The locations of major peaks indicate the inter-stringer distances.« less
NASA Technical Reports Server (NTRS)
Annett, Martin S.; Horta, Lucas G.; Jackson, Karen E.; Polanco, Michael A.; Littell, Justin D.
2012-01-01
Two full-scale crash tests of an MD-500 helicopter were conducted in 2009 and 2010 at NASA Langley's Landing and Impact Research Facility in support of NASA s Subsonic Rotary Wing Crashworthiness Project. The first crash test was conducted to evaluate the performance of an externally mounted composite deployable energy absorber (DEA) under combined impact conditions. In the second crash test, the energy absorber was removed to establish baseline loads that are regarded as severe but survivable. The presence of this energy absorbing device reduced the peak impact acceleration levels by a factor of three. Accelerations and kinematic data collected from the crash tests were compared to a system-integrated finite element model of the test article developed in parallel with the test program. In preparation for the full-scale crash test, a series of sub-scale and MD-500 mass simulator tests were conducted to evaluate the impact performances of various components and subsystems, including new crush tubes and the DEA blocks. Parameters defined for the system-integrated finite element model were determined from these tests. Results from 19 accelerometers placed throughout the airframe were compared to finite element model responses. The model developed for the purposes of predicting acceleration responses from the first crash test was inadequate when evaluating more severe conditions seen in the second crash test. A newly developed model calibration approach that includes uncertainty estimation, parameter sensitivity, impact shape orthogonality, and numerical optimization was used to calibrate model results for the full-scale crash test without the DEA. This combination of heuristic and quantitative methods identified modeling deficiencies, evaluated parameter importance, and proposed required model changes. The multidimensional calibration techniques presented here are particularly effective in identifying model adequacy. Acceleration results for the calibrated model were compared to test results and the original model results. There was a noticeable improvement in the pilot and copilot region, a slight improvement in the occupant model response, and an over-stiffening effect in the passenger region. One lesson learned was that this approach should be adopted early on, in combination with the building-block approaches that are customarily used, for model development and pretest predictions. Complete crash simulations with validated finite element models can be used to satisfy crash certification requirements, potentially reducing overall development costs.
Parallel-Processing Equalizers for Multi-Gbps Communications
NASA Technical Reports Server (NTRS)
Gray, Andrew; Ghuman, Parminder; Hoy, Scott; Satorius, Edgar H.
2004-01-01
Architectures have been proposed for the design of frequency-domain least-mean-square complex equalizers that would be integral parts of parallel- processing digital receivers of multi-gigahertz radio signals and other quadrature-phase-shift-keying (QPSK) or 16-quadrature-amplitude-modulation (16-QAM) of data signals at rates of multiple gigabits per second. Equalizers as used here denotes receiver subsystems that compensate for distortions in the phase and frequency responses of the broad-band radio-frequency channels typically used to convey such signals. The proposed architectures are suitable for realization in very-large-scale integrated (VLSI) circuitry and, in particular, complementary metal oxide semiconductor (CMOS) application- specific integrated circuits (ASICs) operating at frequencies lower than modulation symbol rates. A digital receiver of the type to which the proposed architecture applies (see Figure 1) would include an analog-to-digital converter (A/D) operating at a rate, fs, of 4 samples per symbol period. To obtain the high speed necessary for sampling, the A/D and a 1:16 demultiplexer immediately following it would be constructed as GaAs integrated circuits. The parallel-processing circuitry downstream of the demultiplexer, including a demodulator followed by an equalizer, would operate at a rate of only fs/16 (in other words, at 1/4 of the symbol rate). The output from the equalizer would be four parallel streams of in-phase (I) and quadrature (Q) samples.
Multiple-length-scale deformation analysis in a thermoplastic polyurethane
Sui, Tan; Baimpas, Nikolaos; Dolbnya, Igor P.; Prisacariu, Cristina; Korsunsky, Alexander M.
2015-01-01
Thermoplastic polyurethane elastomers enjoy an exceptionally wide range of applications due to their remarkable versatility. These block co-polymers are used here as an example of a structurally inhomogeneous composite containing nano-scale gradients, whose internal strain differs depending on the length scale of consideration. Here we present a combined experimental and modelling approach to the hierarchical characterization of block co-polymer deformation. Synchrotron-based small- and wide-angle X-ray scattering and radiography are used for strain evaluation across the scales. Transmission electron microscopy image-based finite element modelling and fast Fourier transform analysis are used to develop a multi-phase numerical model that achieves agreement with the combined experimental data using a minimal number of adjustable structural parameters. The results highlight the importance of fuzzy interfaces, that is, regions of nanometre-scale structure and property gradients, in determining the mechanical properties of hierarchical composites across the scales. PMID:25758945
Parallel iterative solution for h and p approximations of the shallow water equations
Barragy, E.J.; Walters, R.A.
1998-01-01
A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
A comparison of parallel and diverging screw angles in the stability of locked plate constructs.
Wähnert, D; Windolf, M; Brianza, S; Rothstock, S; Radtke, R; Brighenti, V; Schwieger, K
2011-09-01
We investigated the static and cyclical strength of parallel and angulated locking plate screws using rigid polyurethane foam (0.32 g/cm(3)) and bovine cancellous bone blocks. Custom-made stainless steel plates with two conically threaded screw holes with different angulations (parallel, 10° and 20° divergent) and 5 mm self-tapping locking screws underwent pull-out and cyclical pull and bending tests. The bovine cancellous blocks were only subjected to static pull-out testing. We also performed finite element analysis for the static pull-out test of the parallel and 20° configurations. In both the foam model and the bovine cancellous bone we found the significantly highest pull-out force for the parallel constructs. In the finite element analysis there was a 47% more damage in the 20° divergent constructs than in the parallel configuration. Under cyclical loading, the mean number of cycles to failure was significantly higher for the parallel group, followed by the 10° and 20° divergent configurations. In our laboratory setting we clearly showed the biomechanical disadvantage of a diverging locking screw angle under static and cyclical loading.
Multi Length Scale Finite Element Design Framework for Advanced Woven Fabrics
NASA Astrophysics Data System (ADS)
Erol, Galip Ozan
Woven fabrics are integral parts of many engineering applications spanning from personal protective garments to surgical scaffolds. They provide a wide range of opportunities in designing advanced structures because of their high tenacity, flexibility, high strength-to-weight ratios and versatility. These advantages result from their inherent multi scale nature where the filaments are bundled together to create yarns while the yarns are arranged into different weave architectures. Their highly versatile nature opens up potential for a wide range of mechanical properties which can be adjusted based on the application. While woven fabrics are viable options for design of various engineering systems, being able to understand the underlying mechanisms of the deformation and associated highly nonlinear mechanical response is important and necessary. However, the multiscale nature and relationships between these scales make the design process involving woven fabrics a challenging task. The objective of this work is to develop a multiscale numerical design framework using experimentally validated mesoscopic and macroscopic length scale approaches by identifying important deformation mechanisms and recognizing the nonlinear mechanical response of woven fabrics. This framework is exercised by developing mesoscopic length scale constitutive models to investigate plain weave fabric response under a wide range of loading conditions. A hyperelastic transversely isotropic yarn material model with transverse material nonlinearity is developed for woven yarns (commonly used in personal protection garments). The material properties/parameters are determined through an inverse method where unit cell finite element simulations are coupled with experiments. The developed yarn material model is validated by simulating full scale uniaxial tensile, bias extension and indentation experiments, and comparing to experimentally observed mechanical response and deformation mechanisms. Moreover, mesoscopic unit cell finite elements are coupled with a design-of-experiments method to systematically identify the important yarn material properties for the macroscale response of various weave architectures. To demonstrate the macroscopic length scale approach, two new material models for woven fabrics were developed. The Planar Material Model (PMM) utilizes two important deformation mechanisms in woven fabrics: (1) yarn elongation, and (2) relative yarn rotation due to shear loads. The yarns' uniaxial tensile response is modeled with a nonlinear spring using constitutive relations while a nonlinear rotational spring is implemented to define fabric's shear stiffness. The second material model, Sawtooth Material Model (SMM) adopts the sawtooth geometry while recognizing the biaxial nature of woven fabrics by implementing the interactions between the yarns. Material properties/parameters required by both PMM and SMM can be directly determined from standard experiments. Both macroscopic material models are implemented within an explicit finite element code and validated by comparing to the experiments. Then, the developed macroscopic material models are compared under various loading conditions to determine their accuracy. Finally, the numerical models developed in the mesoscopic and macroscopic length scales are linked thus demonstrating the new systematic design framework involving linked mesoscopic and macroscopic length scale modeling approaches. The approach is demonstrated with both Planar and Sawtooth Material Models and the simulation results are verified by comparing the results obtained from meso and macro models.
Mesoscale Effective Property Simulations Incorporating Conductive Binder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trembacki, Bradley L.; Noble, David R.; Brunini, Victor E.
Lithium-ion battery electrodes are composed of active material particles, binder, and conductive additives that form an electrolyte-filled porous particle composite. The mesoscale (particle-scale) interplay of electrochemistry, mechanical deformation, and transport through this tortuous multi-component network dictates the performance of a battery at the cell-level. Effective electrode properties connect mesoscale phenomena with computationally feasible battery-scale simulations. We utilize published tomography data to reconstruct a large subsection (1000+ particles) of an NMC333 cathode into a computational mesh and extract electrode-scale effective properties from finite element continuum-scale simulations. We present a novel method to preferentially place a composite binder phase throughout the mesostructure,more » a necessary approach due difficulty distinguishing between non-active phases in tomographic data. We compare stress generation and effective thermal, electrical, and ionic conductivities across several binder placement approaches. Isotropic lithiation-dependent mechanical swelling of the NMC particles and the consideration of strain-dependent composite binder conductivity significantly impact the resulting effective property trends and stresses generated. Lastly, our results suggest that composite binder location significantly affects mesoscale behavior, indicating that a binder coating on active particles is not sufficient and that more accurate approaches should be used when calculating effective properties that will inform battery-scale models in this inherently multi-scale battery simulation challenge.« less
Mesoscale Effective Property Simulations Incorporating Conductive Binder
Trembacki, Bradley L.; Noble, David R.; Brunini, Victor E.; ...
2017-07-26
Lithium-ion battery electrodes are composed of active material particles, binder, and conductive additives that form an electrolyte-filled porous particle composite. The mesoscale (particle-scale) interplay of electrochemistry, mechanical deformation, and transport through this tortuous multi-component network dictates the performance of a battery at the cell-level. Effective electrode properties connect mesoscale phenomena with computationally feasible battery-scale simulations. We utilize published tomography data to reconstruct a large subsection (1000+ particles) of an NMC333 cathode into a computational mesh and extract electrode-scale effective properties from finite element continuum-scale simulations. We present a novel method to preferentially place a composite binder phase throughout the mesostructure,more » a necessary approach due difficulty distinguishing between non-active phases in tomographic data. We compare stress generation and effective thermal, electrical, and ionic conductivities across several binder placement approaches. Isotropic lithiation-dependent mechanical swelling of the NMC particles and the consideration of strain-dependent composite binder conductivity significantly impact the resulting effective property trends and stresses generated. Lastly, our results suggest that composite binder location significantly affects mesoscale behavior, indicating that a binder coating on active particles is not sufficient and that more accurate approaches should be used when calculating effective properties that will inform battery-scale models in this inherently multi-scale battery simulation challenge.« less
NASA Astrophysics Data System (ADS)
Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger
2011-06-01
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
3-Dimensional Marine CSEM Modeling by Employing TDFEM with Parallel Solvers
NASA Astrophysics Data System (ADS)
Wu, X.; Yang, T.
2013-12-01
In this paper, parallel fulfillment is developed for forward modeling of the 3-Dimensional controlled source electromagnetic (CSEM) by using time-domain finite element method (TDFEM). Recently, a greater attention rises on research of hydrocarbon (HC) reservoir detection mechanism in the seabed. Since China has vast ocean resources, seeking hydrocarbon reservoirs become significant in the national economy. However, traditional methods of seismic exploration shown a crucial obstacle to detect hydrocarbon reservoirs in the seabed with a complex structure, due to relatively high acquisition costs and high-risking exploration. In addition, the development of EM simulations typically requires both a deep knowledge of the computational electromagnetics (CEM) and a proper use of sophisticated techniques and tools from computer science. However, the complexity of large-scale EM simulations often requires large memory because of a large amount of data, or solution time to address problems concerning matrix solvers, function transforms, optimization, etc. The objective of this paper is to present parallelized implementation of the time-domain finite element method for analysis of three-dimensional (3D) marine controlled source electromagnetic problems. Firstly, we established a three-dimensional basic background model according to the seismic data, then electromagnetic simulation of marine CSEM was carried out by using time-domain finite element method, which works on a MPI (Message Passing Interface) platform with exact orientation to allow fast detecting of hydrocarbons targets in ocean environment. To speed up the calculation process, SuperLU of an MPI (Message Passing Interface) version called SuperLU_DIST is employed in this approach. Regarding the representation of three-dimension seabed terrain with sense of reality, the region is discretized into an unstructured mesh rather than a uniform one in order to reduce the number of unknowns. Moreover, high-order Whitney vector basis functions are used for spatial discretization within the finite element approach to approximate the electric field. A horizontal electric dipole was used as a source, and an array of the receiver located at the seabed. To capture the presence of the hydrocarbon layer, the forward responses at water depths from 100m to 3000m are calculated. The normalized Magnitude Versus Offset (N-MVO) and Phase Versus Offset (PVO) curve can reflect resistive characteristics of hydrocarbon layers. For future work, Graphics Process Unit (GPU) acceleration algorithm would be carried out to multiply the calculation efficiency greatly.
A 3D staggered-grid finite difference scheme for poroelastic wave equation
NASA Astrophysics Data System (ADS)
Zhang, Yijie; Gao, Jinghuai
2014-10-01
Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Parallelization of implicit finite difference schemes in computational fluid dynamics
NASA Technical Reports Server (NTRS)
Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel
1990-01-01
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
Desmet, Gert
2013-11-01
The finite length parallel zone (FPZ)-model is proposed as an alternative model for the axial- or eddy-dispersion caused by the occurrence of local velocity biases or flow heterogeneities in porous media such as those used in liquid chromatography columns. The mathematical plate height expression evolving from the model shows that the A- and C-term band broadening effects that can originate from a given velocity bias should be coupled in an exponentially decaying way instead of harmonically as proposed in Giddings' coupling theory. In the low and high velocity limit both models converge, while a 12% difference can be observed in the (practically most relevant) intermediate range of reduced velocities. Explicit expressions for the A- and C-constants appearing in the exponential decay-based plate height expression have been derived for each of the different possible velocity bias levels (single through-pore and particle level, multi-particle level and trans-column level). These expressions allow to directly relate the band broadening originating from these different levels to the local fundamental transport parameters, hence offering the possibility to include a velocity-dependent and, if, needed retention factor-dependent transversal dispersion coefficient. Having developed the mathematics for the general case wherein a difference in retention equilibrium establishes between the two parallel zones, the effect of any possible local variations in packing density and/or retention capacity on the eddy-dispersion can be explicitly accounted for as well. It is furthermore also shown that, whereas the lumped transport parameter model used in the basic variant of the FPZ-model only provides a first approximation of the true decay constant, the model can be extended by introducing a constant correction factor to correctly account for the continuous transversal dispersion transport in the velocity bias zones. Copyright © 2013 Elsevier B.V. All rights reserved.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
System software for the finite element machine
NASA Technical Reports Server (NTRS)
Crockett, T. W.; Knott, J. D.
1985-01-01
The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested.
NASA Astrophysics Data System (ADS)
Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui
2017-07-01
Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
The Programming Language Python In Earth System Simulations
NASA Astrophysics Data System (ADS)
Gross, L.; Imranullah, A.; Mora, P.; Saez, E.; Smillie, J.; Wang, C.
2004-12-01
Mathematical models in earth sciences base on the solution of systems of coupled, non-linear, time-dependent partial differential equations (PDEs). The spatial and time-scale vary from a planetary scale and million years for convection problems to 100km and 10 years for fault systems simulations. Various techniques are in use to deal with the time dependency (e.g. Crank-Nicholson), with the non-linearity (e.g. Newton-Raphson) and weakly coupled equations (e.g. non-linear Gauss-Seidel). Besides these high-level solution algorithms discretization methods (e.g. finite element method (FEM), boundary element method (BEM)) are used to deal with spatial derivatives. Typically, large-scale, three dimensional meshes are required to resolve geometrical complexity (e.g. in the case of fault systems) or features in the solution (e.g. in mantel convection simulations). The modelling environment escript allows the rapid implementation of new physics as required for the development of simulation codes in earth sciences. Its main object is to provide a programming language, where the user can define new models and rapidly develop high-level solution algorithms. The current implementation is linked with the finite element package finley as a PDE solver. However, the design is open and other discretization technologies such as finite differences and boundary element methods could be included. escript is implemented as an extension of the interactive programming environment python (see www.python.org). Key concepts introduced are Data objects, which are holding values on nodes or elements of the finite element mesh, and linearPDE objects, which are defining linear partial differential equations to be solved by the underlying discretization technology. In this paper we will show the basic concepts of escript and will show how escript is used to implement a simulation code for interacting fault systems. We will show some results of large-scale, parallel simulations on an SGI Altix system. Acknowledgements: Project work is supported by Australian Commonwealth Government through the Australian Computational Earth Systems Simulator Major National Research Facility, Queensland State Government Smart State Research Facility Fund, The University of Queensland and SGI.
Evolution of large amplitude Alfven waves in solar wind plasmas: Kinetic-fluid models
NASA Astrophysics Data System (ADS)
Nariyuki, Y.
2014-12-01
Large amplitude Alfven waves are ubiquitously observed in solar wind plasmas. Mjolhus(JPP, 1976) and Mio et al(JPSJ, 1976) found that nonlinear evolution of the uni-directional, parallel propagating Alfven waves can be described by the derivative nonlinear Schrodinger equation (DNLS). Later, the multi-dimensional extension (Mjolhus and Wyller, JPP, 1988; Passot and Sulem, POP, 1993; Gazol et al, POP, 1999) and ion kinetic modification (Mjolhus and Wyller, JPP, 1988; Spangler, POP, 1989; Medvedev and Diamond, POP, 1996; Nariyuki et al, POP, 2013) of DNLS have been reported. Recently, Nariyuki derived multi-dimensional DNLS from an expanding box model of the Hall-MHD system (Nariyuki, submitted). The set of equations including the nonlinear evolution of compressional wave modes (TDNLS) was derived by Hada(GRL, 1993). DNLS can be derived from TDNLS by rescaling of the variables (Mjolhus, Phys. Scr., 2006). Nariyuki and Hada(JPSJ, 2007) derived a kinetically modified TDNLS by using a simple Landau closure (Hammet and Perkins, PRL, 1990; Medvedev and Diamond, POP, 1996). In the present study, we revisit the ion kinetic modification of multi-dimensional TDNLS through more rigorous derivations, which is consistent with the past kinetic modification of DNLS. Although the original TDNLS was derived in the multi-dimensional form, the evolution of waves with finite propagation angles in TDNLS has not been paid much attention. Applicability of the resultant models to solar wind turbulence is discussed.
Aquilina, Peter; Chamoli, Uphar; Parr, William C H; Clausen, Philip D; Wroe, Stephen
2013-06-01
The most stable pattern of internal fixation for fractures of the mandibular condyle is a matter for ongoing discussion. In this study we investigated the stability of three commonly used patterns of plate fixation, and constructed finite element models of a simulated mandibular condylar fracture. The completed models were heterogeneous in the distribution of bony material properties, contained about 1.2 million elements, and incorporated simulated jaw-adducting musculature. Models were run assuming linear elasticity and isotropic material properties for bone. This model was considerably larger and more complex than previous finite element models that have been used to analyse the biomechanical behaviour of differing plating techniques. The use of two parallel 2.0 titanium miniplates gave a more stable configuration with lower mean element stresses and displacements over the use of a single miniplate. In addition, a parallel orientation of two miniplates resulted in lower stresses and displacements than did the use of two miniplates in an offset pattern. The use of two parallel titanium plates resulted in a superior biomechanical result as defined by mean element stresses and relative movement between the fractured fragments in these finite element models. Copyright © 2012 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong
2017-07-01
Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.
NASA Astrophysics Data System (ADS)
Draper, Martin; Usera, Gabriel
2015-04-01
The Scale Dependent Dynamic Model (SDDM) has been widely validated in large-eddy simulations using pseudo-spectral codes [1][2][3]. The scale dependency, particularly the potential law, has been proved also in a priori studies [4][5]. To the authors' knowledge there have been only few attempts to use the SDDM in finite difference (FD) and finite volume (FV) codes [6][7], finding some improvements with the dynamic procedures (scale independent or scale dependent approach), but not showing the behavior of the scale-dependence parameter when using the SDDM. The aim of the present paper is to evaluate the SDDM in the open source code caffa3d.MBRi, an updated version of the code presented in [8]. caffa3d.MBRi is a FV code, second-order accurate, parallelized with MPI, in which the domain is divided in unstructured blocks of structured grids. To accomplish this, 2 cases are considered: flow between flat plates and flow over a rough surface with the presence of a model wind turbine, taking for this case the experimental data presented in [9]. In both cases the standard Smagorinsky Model (SM), the Scale Independent Dynamic Model (SIDM) and the SDDM are tested. As presented in [6][7] slight improvements are obtained with the SDDM. Nevertheless, the behavior of the scale-dependence parameter supports the generalization of the dynamic procedure proposed in the SDDM, particularly taking into account that no explicit filter is used (the implicit filter is unknown). [1] F. Porté-Agel, C. Meneveau, M.B. Parlange. "A scale-dependent dynamic model for large-eddy simulation: application to a neutral atmospheric boundary layer". Journal of Fluid Mechanics, 2000, 415, 261-284. [2] E. Bou-Zeid, C. Meneveau, M. Parlante. "A scale-dependent Lagrangian dynamic model for large eddy simulation of complex turbulent flows". Physics of Fluids, 2005, 17, 025105 (18p). [3] R. Stoll, F. Porté-Agel. "Dynamic subgrid-scale models for momentum and scalar fluxes in large-eddy simulations of neutrally stratified atmospheric boundary layers over heterogeneous terrain". Water Resources Research, 2006, 42, WO1409 (18 p). [4] J. Keissl, M. Parlange, C. Meneveau. "Field experimental study of dynamic Smagorinsky models in the atmospheric surface layer". Journal of the Atmospheric Science, 2004, 61, 2296-2307. [5] E. Bou-Zeid, N. Vercauteren, M.B. Parlange, C. Meneveau. "Scale dependence of subgrid-scale model coefficients: An a priori study". Physics of Fluids, 2008, 20, 115106. [6] G. Kirkil, J. Mirocha, E. Bou-Zeid, F.K. Chow, B. Kosovic, "Implementation and evaluation of dynamic subfilter - scale stress models for large - eddy simulation using WRF". Monthly Weather Review, 2012, 140, 266-284. [7] S. Radhakrishnan, U. Piomelli. "Large-eddy simulation of oscillating boundary layers: model comparison and validation". Journal of Geophysical Research, 2008, 113, C02022. [8] G. Usera, A. Vernet, J.A. Ferré. "A parallel block-structured finite volume method for flows in complex geometry with sliding interfaces". Flow, Turbulence and Combustion, 2008, 81, 471-495. [9] Y-T. Wu, F. Porté-Agel. "Large-eddy simulation of wind-turbine wakes: evaluation of turbine parametrisations". BoundaryLayerMeteorology, 2011, 138, 345-366.
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Hoyal Cuthill, Jennifer F.
2015-01-01
Biological variety and major evolutionary transitions suggest that the space of possible morphologies may have varied among lineages and through time. However, most models of phylogenetic character evolution assume that the potential state space is finite. Here, I explore what the morphological state space might be like, by analysing trends in homoplasy (repeated derivation of the same character state). Analyses of ten published character matrices are compared against computer simulations with different state space models: infinite states, finite states, ordered states and an ‘inertial' model, simulating phylogenetic constraints. Of these, only the infinite states model results in evolution without homoplasy, a prediction which is not generally met by real phylogenies. Many authors have interpreted the ubiquity of homoplasy as evidence that the number of evolutionary alternatives is finite. However, homoplasy is also predicted by phylogenetic constraints on the morphological distance that can be traversed between ancestor and descendent. Phylogenetic rarefaction (sub-sampling) shows that finite and inertial state spaces do produce contrasting trends in the distribution of homoplasy. Two clades show trends characteristic of phylogenetic inertia, with decreasing homoplasy (increasing consistency index) as we sub-sample more distantly related taxa. One clade shows increasing homoplasy, suggesting exhaustion of finite states. Different clades may, therefore, show different patterns of character evolution. However, when parsimony uninformative characters are excluded (which may occur without documentation in cladistic studies), it may no longer be possible to distinguish inertial and finite state spaces. Interestingly, inertial models predict that homoplasy should be clustered among comparatively close relatives (parallel evolution), whereas finite state models do not. If morphological evolution is often inertial in nature, then homoplasy (false homology) may primarily occur between close relatives, perhaps being replaced by functional analogy at higher taxonomic scales. PMID:26640650
Multi-scale calculation based on dual domain material point method combined with molecular dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dhakal, Tilak Raj
This dissertation combines the dual domain material point method (DDMP) with molecular dynamics (MD) in an attempt to create a multi-scale numerical method to simulate materials undergoing large deformations with high strain rates. In these types of problems, the material is often in a thermodynamically non-equilibrium state, and conventional constitutive relations are often not available. In this method, the closure quantities, such as stress, at each material point are calculated from a MD simulation of a group of atoms surrounding the material point. Rather than restricting the multi-scale simulation in a small spatial region, such as phase interfaces, or crackmore » tips, this multi-scale method can be used to consider non-equilibrium thermodynamic e ects in a macroscopic domain. This method takes advantage that the material points only communicate with mesh nodes, not among themselves; therefore MD simulations for material points can be performed independently in parallel. First, using a one-dimensional shock problem as an example, the numerical properties of the original material point method (MPM), the generalized interpolation material point (GIMP) method, the convected particle domain interpolation (CPDI) method, and the DDMP method are investigated. Among these methods, only the DDMP method converges as the number of particles increases, but the large number of particles needed for convergence makes the method very expensive especially in our multi-scale method where we calculate stress in each material point using MD simulation. To improve DDMP, the sub-point method is introduced in this dissertation, which provides high quality numerical solutions with a very small number of particles. The multi-scale method based on DDMP with sub-points is successfully implemented for a one dimensional problem of shock wave propagation in a cerium crystal. The MD simulation to calculate stress in each material point is performed in GPU using CUDA to accelerate the computation. The numerical properties of the multiscale method are investigated as well as the results from this multi-scale calculation are compared of particles needed for convergence makes the method very expensive especially in our multi-scale method where we calculate stress in each material point using MD simulation. To improve DDMP, the sub-point method is introduced in this dissertation, which provides high quality numerical solutions with a very small number of particles. The multi-scale method based on DDMP with sub-points is successfully implemented for a one dimensional problem of shock wave propagation in a cerium crystal. The MD simulation to calculate stress in each material point is performed in GPU using CUDA to accelerate the computation. The numerical properties of the multiscale method are investigated as well as the results from this multi-scale calculation are compared with direct MD simulation results to demonstrate the feasibility of the method. Also, the multi-scale method is applied for a two dimensional problem of jet formation around copper notch under a strong impact.« less
NASA Astrophysics Data System (ADS)
Gerke, Kirill; Vasilyev, Roman; Khirevich, Siarhei; Karsanina, Marina; Collins, Daniel; Korost, Dmitry; Mallants, Dirk
2015-04-01
In this contribution we introduce a novel free software which solves the Stokes equation to obtain velocity fields for low Reynolds-number flows within externally generated 3D pore geometries. Provided with velocity fields, one can calculate permeability for known pressure gradient boundary conditions via Darcy's equation. Finite-difference schemes of 2nd and 4th order of accuracy are used together with an artificial compressibility method to iteratively converge to a steady-state solution of Stokes' equation. This numerical approach is much faster and less computationally demanding than the majority of open-source or commercial softwares employing other algorithms (finite elements/volumes, lattice Boltzmann, etc.) The software consists of two parts: 1) a pre and post-processing graphical interface, and 2) a solver. The latter is efficiently parallelized to use any number of available cores (the speedup on 16 threads was up to 10-12 depending on hardware). Due to parallelization and memory optimization our software can be used to obtain solutions for 300x300x300 voxels geometries on modern desktop PCs. The software was successfully verified by testing it against lattice Boltzmann simulations and analytical solutions. To illustrate the software's applicability for numerous problems in Earth Sciences, a number of case studies have been developed: 1) identifying the representative elementary volume for permeability determination within a sandstone sample, 2) derivation of permeability/hydraulic conductivity values for rock and soil samples and comparing those with experimentally obtained values, 3) revealing the influence of the amount of fine-textured material such as clay on filtration properties of sandy soil. This work was partially supported by RSF grant 14-17-00658 (pore-scale modelling) and RFBR grants 13-04-00409-a and 13-05-01176-a.
An overlapped grid method for multigrid, finite volume/difference flow solvers: MaGGiE
NASA Technical Reports Server (NTRS)
Baysal, Oktay; Lessard, Victor R.
1990-01-01
The objective is to develop a domain decomposition method via overlapping/embedding the component grids, which is to be used by upwind, multi-grid, finite volume solution algorithms. A computer code, given the name MaGGiE (Multi-Geometry Grid Embedder) is developed to meet this objective. MaGGiE takes independently generated component grids as input, and automatically constructs the composite mesh and interpolation data, which can be used by the finite volume solution methods with or without multigrid convergence acceleration. Six demonstrative examples showing various aspects of the overlap technique are presented and discussed. These cases are used for developing the procedure for overlapping grids of different topologies, and to evaluate the grid connection and interpolation data for finite volume calculations on a composite mesh. Time fluxes are transferred between mesh interfaces using a trilinear interpolation procedure. Conservation losses are minimal at the interfaces using this method. The multi-grid solution algorithm, using the coaser grid connections, improves the convergence time history as compared to the solution on composite mesh without multi-gridding.
NASA Astrophysics Data System (ADS)
Costantini, Mario; Malvarosa, Fabio; Minati, Federico
2010-03-01
Phase unwrapping and integration of finite differences are key problems in several technical fields. In SAR interferometry and differential and persistent scatterers interferometry digital elevation models and displacement measurements can be obtained after unambiguously determining the phase values and reconstructing the mean velocities and elevations of the observed targets, which can be performed by integrating differential estimates of these quantities (finite differences between neighboring points).In this paper we propose a general formulation for robust and efficient integration of finite differences and phase unwrapping, which includes standard techniques methods as sub-cases. The proposed approach allows obtaining more reliable and accurate solutions by exploiting redundant differential estimates (not only between nearest neighboring points) and multi-dimensional information (e.g. multi-temporal, multi-frequency, multi-baseline observations), or external data (e.g. GPS measurements). The proposed approach requires the solution of linear or quadratic programming problems, for which computationally efficient algorithms exist.The validation tests obtained on real SAR data confirm the validity of the method, which was integrated in our production chain and successfully used also in massive productions.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Halem, Milton (Technical Monitor)
2000-01-01
We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Design, development and use of the finite element machine
NASA Technical Reports Server (NTRS)
Adams, L. M.; Voigt, R. C.
1983-01-01
Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained.