A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Parallel, adaptive finite element methods for conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.
1994-01-01
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett
Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...
2017-01-01
Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.; Ovall, J.; Holst, M.
2014-12-01
We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
Parallel processors and nonlinear structural dynamics algorithms and software
NASA Technical Reports Server (NTRS)
Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.
1989-01-01
The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.
NASA Astrophysics Data System (ADS)
Gassmöller, Rene; Bangerth, Wolfgang
2016-04-01
Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
A framework for grand scale parallelization of the combined finite discrete element method in 2d
NASA Astrophysics Data System (ADS)
Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.
2014-09-01
Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
Developing parallel GeoFEST(P) using the PYRAMID AMR library
NASA Technical Reports Server (NTRS)
Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.
2004-01-01
The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
Grayver, Alexander V.; Kolev, Tzanio V.
2015-11-01
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayver, Alexander V.; Kolev, Tzanio V.
Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
NASA Technical Reports Server (NTRS)
Johnson, C. R., Jr.; Balas, M. J.
1980-01-01
A novel interconnection of distributed parameter system (DPS) identification and adaptive filtering is presented, which culminates in a common statement of coupled autoregressive, moving-average expansion or parallel infinite impulse response configuration adaptive parameterization. The common restricted complexity filter objectives are seen as similar to the reduced-order requirements of the DPS expansion description. The interconnection presents the possibility of an exchange of problem formulations and solution approaches not yet easily addressed in the common finite dimensional lumped-parameter system context. It is concluded that the shared problems raised are nevertheless many and difficult.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation
NASA Astrophysics Data System (ADS)
Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.
2011-03-01
A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Marine Controlled-Source Electromagnetic 2D Inversion for synthetic models.
NASA Astrophysics Data System (ADS)
Liu, Y.; Li, Y.
2016-12-01
We present a 2D inverse algorithm for frequency domain marine controlled-source electromagnetic (CSEM) data, which is based on the regularized Gauss-Newton approach. As a forward solver, our parallel adaptive finite element forward modeling program is employed. It is a self-adaptive, goal-oriented grid refinement algorithm in which a finite element analysis is performed on a sequence of refined meshes. The mesh refinement process is guided by a dual error estimate weighting to bias refinement towards elements that affect the solution at the EM receiver locations. With the use of the direct solver (MUMPS), we can effectively compute the electromagnetic fields for multi-sources and parametric sensitivities. We also implement the parallel data domain decomposition approach of Key and Ovall (2011), with the goal of being able to compute accurate responses in parallel for complicated models and a full suite of data parameters typical of offshore CSEM surveys. All minimizations are carried out by using the Gauss-Newton algorithm and model perturbations at each iteration step are obtained by using the Inexact Conjugate Gradient iteration method. Synthetic test inversions are presented.
Issues in the digital implementation of control compensators. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Moroney, P.
1979-01-01
Techniques developed for the finite-precision implementation of digital filters were used, adapted, and extended for digital feedback compensators, with particular emphasis on steady state, linear-quadratic-Gaussian compensators. Topics covered include: (1) the linear-quadratic-Gaussian problem; (2) compensator structures; (3) architectural issues: serialism, parallelism, and pipelining; (4) finite wordlength effects: quantization noise, quantizing the coefficients, and limit cycles; and (5) the optimization of structures.
Adaptive implicit-explicit and parallel element-by-element iteration schemes
NASA Technical Reports Server (NTRS)
Tezduyar, T. E.; Liou, J.; Nguyen, T.; Poole, S.
1989-01-01
Adaptive implicit-explicit (AIE) and grouped element-by-element (GEBE) iteration schemes are presented for the finite element solution of large-scale problems in computational mechanics and physics. The AIE approach is based on the dynamic arrangement of the elements into differently treated groups. The GEBE procedure, which is a way of rewriting the EBE formulation to make its parallel processing potential and implementation more clear, is based on the static arrangement of the elements into groups with no inter-element coupling within each group. Various numerical tests performed demonstrate the savings in the CPU time and memory.
Parallel deterministic neutronics with AMR in 3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clouse, C.; Ferguson, J.; Hendrickson, C.
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Piras, P; Sansalone, G; Teresi, L; Kotsakis, T; Colangelo, P; Loy, A
2012-07-01
The shape and mechanical performance in Talpidae humeri were studied by means of Geometric Morphometrics and Finite Element Analysis, including both extinct and extant taxa. The aim of this study was to test whether the ability to dig, quantified by humerus mechanical performance, was characterized by convergent or parallel adaptations in different clades of complex tunnel digger within Talpidae, that is, Talpinae+Condylura (monophyletic) and some complex tunnel diggers not belonging to this clade. Our results suggest that the pattern underlying Talpidae humerus evolution is evolutionary parallelism. However, this insight changed to true convergence when we tested an alternative phylogeny based on molecular data, with Condylura moved to a more basal phylogenetic position. Shape and performance analyses, as well as specific comparative methods, provided strong evidence that the ability to dig complex tunnels reached a functional optimum in distantly related taxa. This was also confirmed by the lower phenotypic variance in complex tunnel digger taxa, compared to non-complex tunnel diggers. Evolutionary rates of phenotypic change showed a smooth deceleration in correspondence with the most recent common ancestor of the Talpinae+Condylura clade. Copyright © 2012 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Chung, T. J. (Editor); Karr, Gerald R. (Editor)
1989-01-01
Recent advances in computational fluid dynamics are examined in reviews and reports, with an emphasis on finite-element methods. Sections are devoted to adaptive meshes, atmospheric dynamics, combustion, compressible flows, control-volume finite elements, crystal growth, domain decomposition, EM-field problems, FDM/FEM, and fluid-structure interactions. Consideration is given to free-boundary problems with heat transfer, free surface flow, geophysical flow problems, heat and mass transfer, high-speed flow, incompressible flow, inverse design methods, MHD problems, the mathematics of finite elements, and mesh generation. Also discussed are mixed finite elements, multigrid methods, non-Newtonian fluids, numerical dissipation, parallel vector processing, reservoir simulation, seepage, shallow-water problems, spectral methods, supercomputer architectures, three-dimensional problems, and turbulent flows.
An HP Adaptive Discontinuous Galerkin Method for Hyperbolic Conservation Laws. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Bey, Kim S.
1994-01-01
This dissertation addresses various issues for model classes of hyperbolic conservation laws. The basic approach developed in this work employs a new family of adaptive, hp-version, finite element methods based on a special discontinuous Galerkin formulation for hyperbolic problems. The discontinuous Galerkin formulation admits high-order local approximations on domains of quite general geometry, while providing a natural framework for finite element approximations and for theoretical developments. The use of hp-versions of the finite element method makes possible exponentially convergent schemes with very high accuracies in certain cases; the use of adaptive hp-schemes allows h-refinement in regions of low regularity and p-enrichment to deliver high accuracy, while keeping problem sizes manageable and dramatically smaller than many conventional approaches. The use of discontinuous Galerkin methods is uncommon in applications, but the methods rest on a reasonable mathematical basis for low-order cases and has local approximation features that can be exploited to produce very efficient schemes, especially in a parallel, multiprocessor environment. The place of this work is to first and primarily focus on a model class of linear hyperbolic conservation laws for which concrete mathematical results, methodologies, error estimates, convergence criteria, and parallel adaptive strategies can be developed, and to then briefly explore some extensions to more general cases. Next, we provide preliminaries to the study and a review of some aspects of the theory of hyperbolic conservation laws. We also provide a review of relevant literature on this subject and on the numerical analysis of these types of problems.
NASA Astrophysics Data System (ADS)
Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar
2017-12-01
We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Progress in Computational Simulation of Earthquakes
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert
2006-01-01
GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
Fully-Implicit Navier-Stokes (FIN-S)
NASA Technical Reports Server (NTRS)
Kirk, Benjamin S.
2010-01-01
FIN-S is a SUPG finite element code for flow problems under active development at NASA Lyndon B. Johnson Space Center and within PECOS: a) The code is built on top of the libMesh parallel, adaptive finite element library. b) The initial implementation of the code targeted supersonic/hypersonic laminar calorically perfect gas flows & conjugate heat transfer. c) Initial extension to thermochemical nonequilibrium about 9 months ago. d) The technologies in FIN-S have been enhanced through a strongly collaborative research effort with Sandia National Labs.
A Dynamic Finite Element Method for Simulating the Physics of Faults Systems
NASA Astrophysics Data System (ADS)
Saez, E.; Mora, P.; Gross, L.; Weatherley, D.
2004-12-01
We introduce a dynamic Finite Element method using a novel high level scripting language to describe the physical equations, boundary conditions and time integration scheme. The library we use is the parallel Finley library: a finite element kernel library, designed for solving large-scale problems. It is incorporated as a differential equation solver into a more general library called escript, based on the scripting language Python. This library has been developed to facilitate the rapid development of 3D parallel codes, and is optimised for the Australian Computational Earth Systems Simulator Major National Research Facility (ACcESS MNRF) supercomputer, a 208 processor SGI Altix with a peak performance of 1.1 TFlops. Using the scripting approach we obtain a parallel FE code able to take advantage of the computational efficiency of the Altix 3700. We consider faults as material discontinuities (the displacement, velocity, and acceleration fields are discontinuous at the fault), with elastic behavior. The stress continuity at the fault is achieved naturally through the expression of the fault interactions in the weak formulation. The elasticity problem is solved explicitly in time, using the Saint Verlat scheme. Finally, we specify a suitable frictional constitutive relation and numerical scheme to simulate fault behaviour. Our model is based on previous work on modelling fault friction and multi-fault systems using lattice solid-like models. We adapt the 2D model for simulating the dynamics of parallel fault systems described to the Finite-Element method. The approach uses a frictional relation along faults that is slip and slip-rate dependent, and the numerical integration approach introduced by Mora and Place in the lattice solid model. In order to illustrate the new Finite Element model, single and multi-fault simulation examples are presented.
Ramses-GPU: Second order MUSCL-Handcock finite volume fluid solver
NASA Astrophysics Data System (ADS)
Kestener, Pierre
2017-10-01
RamsesGPU is a reimplementation of RAMSES (ascl:1011.007) which drops the adaptive mesh refinement (AMR) features to optimize 3D uniform grid algorithms for modern graphics processor units (GPU) to provide an efficient software package for astrophysics applications that do not need AMR features but do require a very large number of integration time steps. RamsesGPU provides an very efficient C++/CUDA/MPI software implementation of a second order MUSCL-Handcock finite volume fluid solver for compressible hydrodynamics as a magnetohydrodynamics solver based on the constraint transport technique. Other useful modules includes static gravity, dissipative terms (viscosity, resistivity), and forcing source term for turbulence studies, and special care was taken to enhance parallel input/output performance by using state-of-the-art libraries such as HDF5 and parallel-netcdf.
3D CSEM inversion based on goal-oriented adaptive finite element method
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.
2016-12-01
We present a parallel 3D frequency domain controlled-source electromagnetic inversion code name MARE3DEM. Non-linear inversion of observed data is performed with the Occam variant of regularized Gauss-Newton optimization. The forward operator is based on the goal-oriented finite element method that efficiently calculates the responses and sensitivity kernels in parallel using a data decomposition scheme where independent modeling tasks contain different frequencies and subsets of the transmitters and receivers. To accommodate complex 3D conductivity variation with high flexibility and precision, we adopt the dual-grid approach where the forward mesh conforms to the inversion parameter grid and is adaptively refined until the forward solution converges to the desired accuracy. This dual-grid approach is memory efficient, since the inverse parameter grid remains independent from fine meshing generated around the transmitter and receivers by the adaptive finite element method. Besides, the unstructured inverse mesh efficiently handles multiple scale structures and allows for fine-scale model parameters within the region of interest. Our mesh generation engine keeps track of the refinement hierarchy so that the map of conductivity and sensitivity kernel between the forward and inverse mesh is retained. We employ the adjoint-reciprocity method to calculate the sensitivity kernels which establish a linear relationship between changes in the conductivity model and changes in the modeled responses. Our code uses a direcy solver for the linear systems, so the adjoint problem is efficiently computed by re-using the factorization from the primary problem. Further computational efficiency and scalability is obtained in the regularized Gauss-Newton portion of the inversion using parallel dense matrix-matrix multiplication and matrix factorization routines implemented with the ScaLAPACK library. We show the scalability, reliability and the potential of the algorithm to deal with complex geological scenarios by applying it to the inversion of synthetic marine controlled source EM data generated for a complex 3D offshore model with significant seafloor topography.
NASA Astrophysics Data System (ADS)
Raeli, Alice; Bergmann, Michel; Iollo, Angelo
2018-02-01
We consider problems governed by a linear elliptic equation with varying coefficients across internal interfaces. The solution and its normal derivative can undergo significant variations through these internal boundaries. We present a compact finite-difference scheme on a tree-based adaptive grid that can be efficiently solved using a natively parallel data structure. The main idea is to optimize the truncation error of the discretization scheme as a function of the local grid configuration to achieve second-order accuracy. Numerical illustrations are presented in two and three-dimensional configurations.
Computational aspects of helicopter trim analysis and damping levels from Floquet theory
NASA Technical Reports Server (NTRS)
Gaonkar, Gopal H.; Achar, N. S.
1992-01-01
Helicopter trim settings of periodic initial state and control inputs are investigated for convergence of Newton iteration in computing the settings sequentially and in parallel. The trim analysis uses a shooting method and a weak version of two temporal finite element methods with displacement formulation and with mixed formulation of displacements and momenta. These three methods broadly represent two main approaches of trim analysis: adaptation of initial-value and finite element boundary-value codes to periodic boundary conditions, particularly for unstable and marginally stable systems. In each method, both the sequential and in-parallel schemes are used and the resulting nonlinear algebraic equations are solved by damped Newton iteration with an optimally selected damping parameter. The impact of damped Newton iteration, including earlier-observed divergence problems in trim analysis, is demonstrated by the maximum condition number of the Jacobian matrices of the iterative scheme and by virtual elimination of divergence. The advantages of the in-parallel scheme over the conventional sequential scheme are also demonstrated.
Implicit schemes and parallel computing in unstructured grid CFD
NASA Technical Reports Server (NTRS)
Venkatakrishnam, V.
1995-01-01
The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
A class of hybrid finite element methods for electromagnetics: A review
NASA Technical Reports Server (NTRS)
Volakis, J. L.; Chatterjee, A.; Gong, J.
1993-01-01
Integral equation methods have generally been the workhorse for antenna and scattering computations. In the case of antennas, they continue to be the prominent computational approach, but for scattering applications the requirement for large-scale computations has turned researchers' attention to near neighbor methods such as the finite element method, which has low O(N) storage requirements and is readily adaptable in modeling complex geometrical features and material inhomogeneities. In this paper, we review three hybrid finite element methods for simulating composite scatterers, conformal microstrip antennas, and finite periodic arrays. Specifically, we discuss the finite element method and its application to electromagnetic problems when combined with the boundary integral, absorbing boundary conditions, and artificial absorbers for terminating the mesh. Particular attention is given to large-scale simulations, methods, and solvers for achieving low memory requirements and code performance on parallel computing architectures.
Towards a large-scale scalable adaptive heart model using shallow tree meshes
NASA Astrophysics Data System (ADS)
Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf
2015-10-01
Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
A software platform for continuum modeling of ion channels based on unstructured mesh
NASA Astrophysics Data System (ADS)
Tu, B.; Bai, S. Y.; Chen, M. X.; Xie, Y.; Zhang, L. B.; Lu, B. Z.
2014-01-01
Most traditional continuum molecular modeling adopted finite difference or finite volume methods which were based on a structured mesh (grid). Unstructured meshes were only occasionally used, but an increased number of applications emerge in molecular simulations. To facilitate the continuum modeling of biomolecular systems based on unstructured meshes, we are developing a software platform with tools which are particularly beneficial to those approaches. This work describes the software system specifically for the simulation of a typical, complex molecular procedure: ion transport through a three-dimensional channel system that consists of a protein and a membrane. The platform contains three parts: a meshing tool chain for ion channel systems, a parallel finite element solver for the Poisson-Nernst-Planck equations describing the electrodiffusion process of ion transport, and a visualization program for continuum molecular modeling. The meshing tool chain in the platform, which consists of a set of mesh generation tools, is able to generate high-quality surface and volume meshes for ion channel systems. The parallel finite element solver in our platform is based on the parallel adaptive finite element package PHG which wass developed by one of the authors [1]. As a featured component of the platform, a new visualization program, VCMM, has specifically been developed for continuum molecular modeling with an emphasis on providing useful facilities for unstructured mesh-based methods and for their output analysis and visualization. VCMM provides a graphic user interface and consists of three modules: a molecular module, a meshing module and a numerical module. A demonstration of the platform is provided with a study of two real proteins, the connexin 26 and hemolysin ion channels.
Earthquake Rupture Dynamics using Adaptive Mesh Refinement and High-Order Accurate Numerical Methods
NASA Astrophysics Data System (ADS)
Kozdon, J. E.; Wilcox, L.
2013-12-01
Our goal is to develop scalable and adaptive (spatial and temporal) numerical methods for coupled, multiphysics problems using high-order accurate numerical methods. To do so, we are developing an opensource, parallel library known as bfam (available at http://bfam.in). The first application to be developed on top of bfam is an earthquake rupture dynamics solver using high-order discontinuous Galerkin methods and summation-by-parts finite difference methods. In earthquake rupture dynamics, wave propagation in the Earth's crust is coupled to frictional sliding on fault interfaces. This coupling is two-way, required the simultaneous simulation of both processes. The use of laboratory-measured friction parameters requires near-fault resolution that is 4-5 orders of magnitude higher than that needed to resolve the frequencies of interest in the volume. This, along with earlier simulations using a low-order, finite volume based adaptive mesh refinement framework, suggest that adaptive mesh refinement is ideally suited for this problem. The use of high-order methods is motivated by the high level of resolution required off the fault in earlier the low-order finite volume simulations; we believe this need for resolution is a result of the excessive numerical dissipation of low-order methods. In bfam spatial adaptivity is handled using the p4est library and temporal adaptivity will be accomplished through local time stepping. In this presentation we will present the guiding principles behind the library as well as verification of code against the Southern California Earthquake Center dynamic rupture code validation test problems.
NASA Astrophysics Data System (ADS)
Schwing, Alan Michael
For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable comparisons across a range of regimes. Unsteady and steady applications are considered in both subsonic and supersonic flows. Inviscid and viscous simulations achieve similar results at a much reduced cost when employing dynamic mesh adaptation. Several techniques for guiding adaptation are compared. Detailed analysis of statistics from the instrumented solver enable understanding of the costs associated with adaptation. Adaptive mesh refinement shows promise for the test cases presented here. It can be considerably faster than using conventional grids and provides accurate results. The procedures for adapting the grid are light-weight enough to not require significant computational time and yield significant reductions in grid size.
NASA Technical Reports Server (NTRS)
Noor, A. K. (Editor); Hayduk, R. J. (Editor)
1985-01-01
Among the topics discussed are developments in structural engineering hardware and software, computation for fracture mechanics, trends in numerical analysis and parallel algorithms, mechanics of materials, advances in finite element methods, composite materials and structures, determinations of random motion and dynamic response, optimization theory, automotive tire modeling methods and contact problems, the damping and control of aircraft structures, and advanced structural applications. Specific topics covered include structural design expert systems, the evaluation of finite element system architectures, systolic arrays for finite element analyses, nonlinear finite element computations, hierarchical boundary elements, adaptive substructuring techniques in elastoplastic finite element analyses, automatic tracking of crack propagation, a theory of rate-dependent plasticity, the torsional stability of nonlinear eccentric structures, a computation method for fluid-structure interaction, the seismic analysis of three-dimensional soil-structure interaction, a stress analysis for a composite sandwich panel, toughness criterion identification for unidirectional composite laminates, the modeling of submerged cable dynamics, and damping synthesis for flexible spacecraft structures.
Nyx: Adaptive mesh, massively-parallel, cosmological simulation code
NASA Astrophysics Data System (ADS)
Almgren, Ann; Beckner, Vince; Friesen, Brian; Lukic, Zarija; Zhang, Weiqun
2017-12-01
Nyx code solves equations of compressible hydrodynamics on an adaptive grid hierarchy coupled with an N-body treatment of dark matter. The gas dynamics in Nyx use a finite volume methodology on an adaptive set of 3-D Eulerian grids; dark matter is represented as discrete particles moving under the influence of gravity. Particles are evolved via a particle-mesh method, using Cloud-in-Cell deposition/interpolation scheme. Both baryonic and dark matter contribute to the gravitational field. In addition, Nyx includes physics for accurately modeling the intergalactic medium; in optically thin limits and assuming ionization equilibrium, the code calculates heating and cooling processes of the primordial-composition gas in an ionizing ultraviolet background radiation field.
GPU-based ultra-fast dose calculation using a finite size pencil beam model.
Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B
2009-10-21
Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
NASA Astrophysics Data System (ADS)
Gruber, Ralph; Periaux, Jaques; Shaw, Richard Paul
Recent advances in computational mechanics are discussed in reviews and reports. Topics addressed include spectral superpositions on finite elements for shear banding problems, strain-based finite plasticity, numerical simulation of hypersonic viscous continuum flow, constitutive laws in solid mechanics, dynamics problems, fracture mechanics and damage tolerance, composite plates and shells, contact and friction, metal forming and solidification, coupling problems, and adaptive FEMs. Consideration is given to chemical flows, convection problems, free boundaries and artificial boundary conditions, domain-decomposition and multigrid methods, combustion and thermal analysis, wave propagation, mixed and hybrid FEMs, integral-equation methods, optimization, software engineering, and vector and parallel computing.
An efficicient data structure for three-dimensional vertex based finite volume method
NASA Astrophysics Data System (ADS)
Akkurt, Semih; Sahin, Mehmet
2017-11-01
A vertex based three-dimensional finite volume algorithm has been developed using an edge based data structure.The mesh data structure of the given algorithm is similar to ones that exist in the literature. However, the data structures are redesigned and simplied in order to fit requirements of the vertex based finite volume method. In order to increase the cache efficiency, the data access patterns for the vertex based finite volume method are investigated and these datas are packed/allocated in a way that they are close to each other in the memory. The present data structure is not limited with tetrahedrons, arbitrary polyhedrons are also supported in the mesh without putting any additional effort. Furthermore, the present data structure also supports adaptive refinement and coarsening. For the implicit and parallel implementation of the FVM algorithm, PETSc and MPI libraries are employed. The performance and accuracy of the present algorithm are tested for the classical benchmark problems by comparing the CPU time for the open source algorithms.
Heidenreich, Elvio A; Ferrero, José M; Doblaré, Manuel; Rodríguez, José F
2010-07-01
Many problems in biology and engineering are governed by anisotropic reaction-diffusion equations with a very rapidly varying reaction term. This usually implies the use of very fine meshes and small time steps in order to accurately capture the propagating wave while avoiding the appearance of spurious oscillations in the wave front. This work develops a family of macro finite elements amenable for solving anisotropic reaction-diffusion equations with stiff reactive terms. The developed elements are incorporated on a semi-implicit algorithm based on operator splitting that includes adaptive time stepping for handling the stiff reactive term. A linear system is solved on each time step to update the transmembrane potential, whereas the remaining ordinary differential equations are solved uncoupled. The method allows solving the linear system on a coarser mesh thanks to the static condensation of the internal degrees of freedom (DOF) of the macroelements while maintaining the accuracy of the finer mesh. The method and algorithm have been implemented in parallel. The accuracy of the method has been tested on two- and three-dimensional examples demonstrating excellent behavior when compared to standard linear elements. The better performance and scalability of different macro finite elements against standard finite elements have been demonstrated in the simulation of a human heart and a heterogeneous two-dimensional problem with reentrant activity. Results have shown a reduction of up to four times in computational cost for the macro finite elements with respect to equivalent (same number of DOF) standard linear finite elements as well as good scalability properties.
DOUAR: A new three-dimensional creeping flow numerical model for the solution of geological problems
NASA Astrophysics Data System (ADS)
Braun, Jean; Thieulot, Cédric; Fullsack, Philippe; DeKool, Marthijn; Beaumont, Christopher; Huismans, Ritske
2008-12-01
We present a new finite element code for the solution of the Stokes and energy (or heat transport) equations that has been purposely designed to address crustal-scale to mantle-scale flow problems in three dimensions. Although it is based on an Eulerian description of deformation and flow, the code, which we named DOUAR ('Earth' in Breton language), has the ability to track interfaces and, in particular, the free surface, by using a dual representation based on a set of particles placed on the interface and the computation of a level set function on the nodes of the finite element grid, thus ensuring accuracy and efficiency. The code also makes use of a new method to compute the dynamic Delaunay triangulation connecting the particles based on non-Euclidian, curvilinear measure of distance, ensuring that the density of particles remains uniform and/or dynamically adapted to the curvature of the interface. The finite element discretization is based on a non-uniform, yet regular octree division of space within a unit cube that allows efficient adaptation of the finite element discretization, i.e. in regions of strong velocity gradient or high interface curvature. The finite elements are cubes (the leaves of the octree) in which a q1- p0 interpolation scheme is used. Nodal incompatibilities across faces separating elements of differing size are dealt with by introducing linear constraints among nodal degrees of freedom. Discontinuities in material properties across the interfaces are accommodated by the use of a novel method (which we called divFEM) to integrate the finite element equations in which the elemental volume is divided by a local octree to an appropriate depth (resolution). A variety of rheologies have been implemented including linear, non-linear and thermally activated creep and brittle (or plastic) frictional deformation. A simple smoothing operator has been defined to avoid checkerboard oscillations in pressure that tend to develop when using a highly irregular octree discretization and the tri-linear (or q1- p0) finite element. A three-dimensional cloud of particles is used to track material properties that depend on the integrated history of deformation (the integrated strain, for example); its density is variable and dynamically adapted to the computed flow. The large system of algebraic equations that results from the finite element discretization and linearization of the basic partial differential equations is solved using a multi-frontal massively parallel direct solver that can efficiently factorize poorly conditioned systems resulting from the highly non-linear rheology and the presence of the free surface. The code is almost entirely parallelized. We present example results including the onset of a Rayleigh-Taylor instability, the indentation of a rigid-plastic material and the formation of a fold beneath a free eroding surface, that demonstrate the accuracy, efficiency and appropriateness of the new code to solve complex geodynamical problems in three dimensions.
Adaptive multi-resolution 3D Hartree-Fock-Bogoliubov solver for nuclear structure
NASA Astrophysics Data System (ADS)
Pei, J. C.; Fann, G. I.; Harrison, R. J.; Nazarewicz, W.; Shi, Yue; Thornton, S.
2014-08-01
Background: Complex many-body systems, such as triaxial and reflection-asymmetric nuclei, weakly bound halo states, cluster configurations, nuclear fragments produced in heavy-ion fusion reactions, cold Fermi gases, and pasta phases in neutron star crust, are all characterized by large sizes and complex topologies in which many geometrical symmetries characteristic of ground-state configurations are broken. A tool of choice to study such complex forms of matter is an adaptive multi-resolution wavelet analysis. This method has generated much excitement since it provides a common framework linking many diversified methodologies across different fields, including signal processing, data compression, harmonic analysis and operator theory, fractals, and quantum field theory. Purpose: To describe complex superfluid many-fermion systems, we introduce an adaptive pseudospectral method for solving self-consistent equations of nuclear density functional theory in three dimensions, without symmetry restrictions. Methods: The numerical method is based on the multi-resolution and computational harmonic analysis techniques with a multi-wavelet basis. The application of state-of-the-art parallel programming techniques include sophisticated object-oriented templates which parse the high-level code into distributed parallel tasks with a multi-thread task queue scheduler for each multi-core node. The internode communications are asynchronous. The algorithm is variational and is capable of solving coupled complex-geometric systems of equations adaptively, with functional and boundary constraints, in a finite spatial domain of very large size, limited by existing parallel computer memory. For smooth functions, user-defined finite precision is guaranteed. Results: The new adaptive multi-resolution Hartree-Fock-Bogoliubov (HFB) solver madness-hfb is benchmarked against a two-dimensional coordinate-space solver hfb-ax that is based on the B-spline technique and a three-dimensional solver hfodd that is based on the harmonic-oscillator basis expansion. Several examples are considered, including the self-consistent HFB problem for spin-polarized trapped cold fermions and the Skyrme-Hartree-Fock (+BCS) problem for triaxial deformed nuclei. Conclusions: The new madness-hfb framework has many attractive features when applied to nuclear and atomic problems involving many-particle superfluid systems. Of particular interest are weakly bound nuclear configurations close to particle drip lines, strongly elongated and dinuclear configurations such as those present in fission and heavy-ion fusion, and exotic pasta phases that appear in neutron star crust.
A new conformal absorbing boundary condition for finite element meshes and parallelization of FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, A.; Volakis, J. L.; Nguyen, J.; Nurnberger, M.; Ross, D.
1993-01-01
Some of the progress toward the development and parallelization of an improved version of the finite element code FEMATS is described. This is a finite element code for computing the scattering by arbitrarily shaped three dimensional surfaces composite scatterers. The following tasks were worked on during the report period: (1) new absorbing boundary conditions (ABC's) for truncating the finite element mesh; (2) mixed mesh termination schemes; (3) hierarchical elements and multigridding; (4) parallelization; and (5) various modeling enhancements (antenna feeds, anisotropy, and higher order GIBC).
Design of High Field Solenoids made of High Temperature Superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartalesi, Antonio; /Pisa U.
2010-12-01
This thesis starts from the analytical mechanical analysis of a superconducting solenoid, loaded by self generated Lorentz forces. Also, a finite element model is proposed and verified with the analytical results. To study the anisotropic behavior of a coil made by layers of superconductor and insulation, a finite element meso-mechanic model is proposed and designed. The resulting material properties are then used in the main solenoid analysis. In parallel, design work is performed as well: an existing Insert Test Facility (ITF) is adapted and structurally verified to support a coil made of YBa{sub 2}Cu{sub 3}O{sub 7}, a High Temperature Superconductormore » (HTS). Finally, a technological winding process was proposed and the required tooling is designed.« less
Parallel iterative methods for sparse linear and nonlinear equations
NASA Technical Reports Server (NTRS)
Saad, Youcef
1989-01-01
As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
User's Guide for ENSAERO_FE Parallel Finite Element Solver
NASA Technical Reports Server (NTRS)
Eldred, Lloyd B.; Guruswamy, Guru P.
1999-01-01
A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
NASA Technical Reports Server (NTRS)
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
NASA Astrophysics Data System (ADS)
Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael
2007-07-01
One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
Practical aspects of prestack depth migration with finite differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.; Oldfield, R.A.; Womble, D.E.
1997-07-01
Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Discontinuous Galerkin Finite Element Method for Parabolic Problems
NASA Technical Reports Server (NTRS)
Kaneko, Hideaki; Bey, Kim S.; Hou, Gene J. W.
2004-01-01
In this paper, we develop a time and its corresponding spatial discretization scheme, based upon the assumption of a certain weak singularity of parallel ut(t) parallel Lz(omega) = parallel ut parallel2, for the discontinuous Galerkin finite element method for one-dimensional parabolic problems. Optimal convergence rates in both time and spatial variables are obtained. A discussion of automatic time-step control method is also included.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Retrieving infinite numbers of patterns in a spin-glass model of immune networks
NASA Astrophysics Data System (ADS)
Agliari, E.; Annibale, A.; Barra, A.; Coolen, A. C. C.; Tantari, D.
2017-01-01
The similarity between neural and (adaptive) immune networks has been known for decades, but so far we did not understand the mechanism that allows the immune system, unlike associative neural networks, to recall and execute a large number of memorized defense strategies in parallel. The explanation turns out to lie in the network topology. Neurons interact typically with a large number of other neurons, whereas interactions among lymphocytes in immune networks are very specific, and described by graphs with finite connectivity. In this paper we use replica techniques to solve a statistical mechanical immune network model with “coordinator branches” (T-cells) and “effector branches” (B-cells), and show how the finite connectivity enables the coordinators to manage an extensive number of effectors simultaneously, even above the percolation threshold (where clonal cross-talk is not negligible). A consequence of its underlying topological sparsity is that the adaptive immune system exhibits only weak ergodicity breaking, so that also spontaneous switch-like effects as bi-stabilities are present: the latter may play a significant role in the maintenance of immune homeostasis.
Error estimation and adaptive mesh refinement for parallel analysis of shell structures
NASA Technical Reports Server (NTRS)
Keating, Scott C.; Felippa, Carlos A.; Park, K. C.
1994-01-01
The formulation and application of element-level, element-independent error indicators is investigated. This research culminates in the development of an error indicator formulation which is derived based on the projection of element deformation onto the intrinsic element displacement modes. The qualifier 'element-level' means that no information from adjacent elements is used for error estimation. This property is ideally suited for obtaining error values and driving adaptive mesh refinements on parallel computers where access to neighboring elements residing on different processors may incur significant overhead. In addition such estimators are insensitive to the presence of physical interfaces and junctures. An error indicator qualifies as 'element-independent' when only visible quantities such as element stiffness and nodal displacements are used to quantify error. Error evaluation at the element level and element independence for the error indicator are highly desired properties for computing error in production-level finite element codes. Four element-level error indicators have been constructed. Two of the indicators are based on variational formulation of the element stiffness and are element-dependent. Their derivations are retained for developmental purposes. The second two indicators mimic and exceed the first two in performance but require no special formulation of the element stiffness mesh refinement which we demonstrate for two dimensional plane stress problems. The parallelizing of substructures and adaptive mesh refinement is discussed and the final error indicator using two-dimensional plane-stress and three-dimensional shell problems is demonstrated.
Constitutive Model Calibration via Autonomous Multiaxial Experimentation (Postprint)
2016-09-17
test machine. Experimental data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain...data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain conditions. Optimization methods...be used directly in finite element simulations of more complex geometries. Keywords Axial/torsional experimentation • Plasticity • Constitutive model
Finite-Time Adaptive Control for a Class of Nonlinear Systems With Nonstrict Feedback Structure.
Sun, Yumei; Chen, Bing; Lin, Chong; Wang, Honghong
2017-09-18
This paper focuses on finite-time adaptive neural tracking control for nonlinear systems in nonstrict feedback form. A semiglobal finite-time practical stability criterion is first proposed. Correspondingly, the finite-time adaptive neural control strategy is given by using this criterion. Unlike the existing results on adaptive neural/fuzzy control, the proposed adaptive neural controller guarantees that the tracking error converges to a sufficiently small domain around the origin in finite time, and other closed-loop signals are bounded. At last, two examples are used to test the validity of our results.
A METHOD FOR IN-SITU CHARACTERIZATION OF RF HEATING IN PARALLEL TRANSMIT MRI
Alon, Leeor; Deniz, Cem Murat; Brown, Ryan; Sodickson, Daniel K.; Zhu, Yudong
2012-01-01
In ultra high field magnetic resonance imaging, parallel radio-frequency (RF) transmission presents both opportunities and challenges for specific absorption rate (SAR) management. On one hand, parallel transmission provides flexibility in tailoring electric fields in the body while facilitating magnetization profile control. On the other hand, it increases the complexity of energy deposition as well as possibly exacerbating local SAR by improper design or delivery of RF pulses. This study shows that the information needed to characterize RF heating in parallel transmission is contained within a local power correlation matrix. Building upon a calibration scheme involving a finite number of magnetic resonance thermometry measurements, the present work establishes a way of estimating the local power correlation matrix. Determination of this matrix allows prediction of temperature change for an arbitrary parallel transmit RF pulse. In the case of a three transmit coil MR experiment in a phantom, determination and validation of the power correlation matrix was conducted in less than 200 minutes with induced temperature changes of <4 degrees C. Further optimization and adaptation are possible, and simulations evaluating potential feasibility for in vivo use are presented. The method allows general characteristics indicative of RF coil/pulse safety determined in situ. PMID:22714806
Development of 3D electromagnetic modeling tools for airborne vehicles
NASA Technical Reports Server (NTRS)
Volakis, John L.
1992-01-01
The main goal of this report is to advance the development of methodologies for scattering by airborne composite vehicles. Although the primary focus continues to be the development of a general purpose computer code for analyzing the entire structure as a single unit, a number of other tasks are also being pursued in parallel with this effort. One of these tasks discussed within is on new finite element formulations and mesh termination schemes. The goal here is to decrease computation time while retaining accuracy and geometric adaptability.The second task focuses on the application of wavelets to electromagnetics. Wavelet transformations are shown to be able to reduce a full matrix to a band matrix, thereby reducing the solutions memory requirements. Included within this document are two separate papers on finite element formulations and wavelets.
Adaptive Strategies for Controls of Flexible Arms. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Yuan, Bau-San
1989-01-01
An adaptive controller for a modern manipulator has been designed based on asymptotical stability via the Lyapunov criterion with the output error between the system and a reference model used as the actuating control signal. Computer simulations were carried out to test the design. The combination of the adaptive controller and a system vibration and mode shape estimator show that the flexible arm should move along a pre-defined trajectory with high-speed motion and fast vibration setting time. An existing computer-controlled prototype two link manipulator, RALF (Robotic Arm, Large Flexible), with a parallel mechanism driven by hydraulic actuators was used to verify the mathematical analysis. The experimental results illustrate that assumed modes found from finite element techniques can be used to derive the equations of motion with acceptable accuracy. The robust adaptive (modal) control is implemented to compensate for unmodelled modes and nonlinearities and is compared with the joint feedback control in additional experiments. Preliminary results show promise for the experimental control algorithm.
Vectorial finite elements for solving the radiative transfer equation
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.
2018-06-01
The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Summers, R.M.; Wong, M.K.; Boucheron, E.A.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Watson, Willie R. (Technical Monitor)
2005-01-01
The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
The effect of anisotropic heat transport on magnetic islands in 3-D configurations
NASA Astrophysics Data System (ADS)
Schlutt, M. G.; Hegna, C. C.
2012-08-01
An analytic theory of nonlinear pressure-induced magnetic island formation using a boundary layer analysis is presented. This theory extends previous work by including the effects of finite parallel heat transport and is applicable to general three dimensional magnetic configurations. In this work, particular attention is paid to the role of finite parallel heat conduction in the context of pressure-induced island physics. It is found that localized currents that require self-consistent deformation of the pressure profile, such as resistive interchange and bootstrap currents, are attenuated by finite parallel heat conduction when the magnetic islands are sufficiently small. However, these anisotropic effects do not change saturated island widths caused by Pfirsch-Schlüter current effects. Implications for finite pressure-induced island healing are discussed.
Simulation of Hypervelocity Impact on Aluminum-Nextel-Kevlar Orbital Debris Shields
NASA Technical Reports Server (NTRS)
Fahrenthold, Eric P.
2000-01-01
An improved hybrid particle-finite element method has been developed for hypervelocity impact simulation. The method combines the general contact-impact capabilities of particle codes with the true Lagrangian kinematics of large strain finite element formulations. Unlike some alternative schemes which couple Lagrangian finite element models with smooth particle hydrodynamics, the present formulation makes no use of slidelines or penalty forces. The method has been implemented in a parallel, three dimensional computer code. Simulations of three dimensional orbital debris impact problems using this parallel hybrid particle-finite element code, show good agreement with experiment and good speedup in parallel computation. The simulations included single and multi-plate shields as well as aluminum and composite shielding materials. at an impact velocity of eleven kilometers per second.
Solution of a tridiagonal system of equations on the finite element machine
NASA Technical Reports Server (NTRS)
Bostic, S. W.
1984-01-01
Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
Parallel eigenanalysis of finite element models in a completely connected architecture
NASA Technical Reports Server (NTRS)
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
NASA Astrophysics Data System (ADS)
Sheng, Lizeng
The dissertation focuses on one of the major research needs in the area of adaptive/intelligent/smart structures, the development and application of finite element analysis and genetic algorithms for optimal design of large-scale adaptive structures. We first review some basic concepts in finite element method and genetic algorithms, along with the research on smart structures. Then we propose a solution methodology for solving a critical problem in the design of a next generation of large-scale adaptive structures---optimal placements of a large number of actuators to control thermal deformations. After briefly reviewing the three most frequently used general approaches to derive a finite element formulation, the dissertation presents techniques associated with general shell finite element analysis using flat triangular laminated composite elements. The element used here has three nodes and eighteen degrees of freedom and is obtained by combining a triangular membrane element and a triangular plate bending element. The element includes the coupling effect between membrane deformation and bending deformation. The membrane element is derived from the linear strain triangular element using Cook's transformation. The discrete Kirchhoff triangular (DKT) element is used as the plate bending element. For completeness, a complete derivation of the DKT is presented. Geometrically nonlinear finite element formulation is derived for the analysis of adaptive structures under the combined thermal and electrical loads. Next, we solve the optimization problems of placing a large number of piezoelectric actuators to control thermal distortions in a large mirror in the presence of four different thermal loads. We then extend this to a multi-objective optimization problem of determining only one set of piezoelectric actuator locations that can be used to control the deformation in the same mirror under the action of any one of the four thermal loads. A series of genetic algorithms, GA Version 1, 2 and 3, were developed to find the optimal locations of piezoelectric actuators from the order of 1021 ˜ 1056 candidate placements. Introducing a variable population approach, we improve the flexibility of selection operation in genetic algorithms. Incorporating mutation and hill climbing into micro-genetic algorithms, we are able to develop a more efficient genetic algorithm. Through extensive numerical experiments, we find that the design search space for the optimal placements of a large number of actuators is highly multi-modal and that the most distinct nature of genetic algorithms is their robustness. They give results that are random but with only a slight variability. The genetic algorithms can be used to get adequate solution using a limited number of evaluations. To get the highest quality solution, multiple runs including different random seed generators are necessary. The investigation time can be significantly reduced using a very coarse grain parallel computing. Overall, the methodology of using finite element analysis and genetic algorithm optimization provides a robust solution approach for the challenging problem of optimal placements of a large number of actuators in the design of next generation of adaptive structures.
Element-topology-independent preconditioners for parallel finite element computations
NASA Technical Reports Server (NTRS)
Park, K. C.; Alexander, Scott
1992-01-01
A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Automatic Thread-Level Parallelization in the Chombo AMR Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christen, Matthias; Keen, Noel; Ligocki, Terry
2011-05-26
The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
NASA Astrophysics Data System (ADS)
Jha, Pradeep Kumar
Capturing the effects of detailed-chemistry on turbulent combustion processes is a central challenge faced by the numerical combustion community. However, the inherent complexity and non-linear nature of both turbulence and chemistry require that combustion models rely heavily on engineering approximations to remain computationally tractable. This thesis proposes a computationally efficient algorithm for modelling detailed-chemistry effects in turbulent diffusion flames and numerically predicting the associated flame properties. The cornerstone of this combustion modelling tool is the use of parallel Adaptive Mesh Refinement (AMR) scheme with the recently proposed Flame Prolongation of Intrinsic low-dimensional manifold (FPI) tabulated-chemistry approach for modelling complex chemistry. The effect of turbulence on the mean chemistry is incorporated using a Presumed Conditional Moment (PCM) approach based on a beta-probability density function (PDF). The two-equation k-w turbulence model is used for modelling the effects of the unresolved turbulence on the mean flow field. The finite-rate of methane-air combustion is represented here by using the GRI-Mech 3.0 scheme. This detailed mechanism is used to build the FPI tables. A state of the art numerical scheme based on a parallel block-based solution-adaptive algorithm has been developed to solve the Favre-averaged Navier-Stokes (FANS) and other governing partial-differential equations using a second-order accurate, fully-coupled finite-volume formulation on body-fitted, multi-block, quadrilateral/hexahedral mesh for two-dimensional and three-dimensional flow geometries, respectively. A standard fourth-order Runge-Kutta time-marching scheme is used for time-accurate temporal discretizations. Numerical predictions of three different diffusion flames configurations are considered in the present work: a laminar counter-flow flame; a laminar co-flow diffusion flame; and a Sydney bluff-body turbulent reacting flow. Comparisons are made between the predicted results of the present FPI scheme and Steady Laminar Flamelet Model (SLFM) approach for diffusion flames. The effects of grid resolution on the predicted overall flame solutions are also assessed. Other non-reacting flows have also been considered to further validate other aspects of the numerical scheme. The present schemes predict results which are in good agreement with published experimental results and reduces the computational cost involved in modelling turbulent diffusion flames significantly, both in terms of storage and processing time.
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids
NASA Astrophysics Data System (ADS)
Hubber, D. A.; Rosotti, G. P.; Booth, R. A.
2018-01-01
GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelized with both OPENMP and MPI and contains a PYTHON library for analysis and visualization. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of smoothed particle hydrodynamics, meshless finite-volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.
Assignment Of Finite Elements To Parallel Processors
NASA Technical Reports Server (NTRS)
Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.
1990-01-01
Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.
Nemesis I: Parallel Enhancements to ExodusII
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hennigan, Gary L.; John, Matthew S.; Shadid, John N.
2006-03-28
NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.
NASA Astrophysics Data System (ADS)
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
An object-oriented, coprocessor-accelerated model for ice sheet simulations
NASA Astrophysics Data System (ADS)
Seddik, H.; Greve, R.
2013-12-01
Recently, numerous models capable of modeling the thermo-dynamics of ice sheets have been developed within the ice sheet modeling community. Their capabilities have been characterized by a wide range of features with different numerical methods (finite difference or finite element), different implementations of the ice flow mechanics (shallow-ice, higher-order, full Stokes) and different treatments for the basal and coastal areas (basal hydrology, basal sliding, ice shelves). Shallow-ice models (SICOPOLIS, IcIES, PISM, etc) have been widely used for modeling whole ice sheets (Greenland and Antarctica) due to the relatively low computational cost of the shallow-ice approximation but higher order (ISSM, AIF) and full Stokes (Elmer/Ice) models have been recently used to model the Greenland ice sheet. The advance in processor speed and the decrease in cost for accessing large amount of memory and storage have undoubtedly been the driving force in the commoditization of models with higher capabilities, and the popularity of Elmer/Ice (http://elmerice.elmerfem.com) with an active user base is a notable representation of this trend. Elmer/Ice is a full Stokes model built on top of the multi-physics package Elmer (http://www.csc.fi/english/pages/elmer) which provides the full machinery for the complex finite element procedure and is fully parallel (mesh partitioning with OpenMPI communication). Elmer is mainly written in Fortran 90 and targets essentially traditional processors as the code base was not initially written to run on modern coprocessors (yet adding support for the recently introduced x86 based coprocessors is possible). Furthermore, a truly modular and object-oriented implementation is required for quick adaptation to fast evolving capabilities in hardware (Fortran 2003 provides an object-oriented programming model while not being clean and requiring a tricky refactoring of Elmer code). In this work, the object-oriented, coprocessor-accelerated finite element code Sainou is introduced. Sainou is an Elmer fork which is reimplemented in Objective C and used for experimenting with ice sheet models running on coprocessors, essentially GPU devices. GPUs are highly parallel processors that provide opportunities for fine-grained parallelization of the full Stokes problem using the standard OpenCL language (http://www.khronos.org/opencl/) to access the device. Sainou is built upon a collection of Objective C base classes that service a modular kernel (itself a base class) which provides the core methods to solve the finite element problem. An early implementation of Sainou will be presented with emphasis on the object architecture and the strategies of parallelizations. The computation of a simple heat conduction problem is used to test the implementation which also provides experimental support for running the global matrix assembly on GPU.
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, F.; Morel, M.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-02-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Technical Reports Server (NTRS)
Kamat, Manohar P.; Watson, Brian C.
1992-01-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
Adaptive finite element methods for two-dimensional problems in computational fracture mechanics
NASA Technical Reports Server (NTRS)
Min, J. B.; Bass, J. M.; Spradley, L. W.
1994-01-01
Some recent results obtained using solution-adaptive finite element methods in two-dimensional problems in linear elastic fracture mechanics are presented. The focus is on the basic issue of adaptive finite element methods for validating the new methodology by computing demonstration problems and comparing the stress intensity factors to analytical results.
NASA Astrophysics Data System (ADS)
Xing, F.; Masson, R.; Lopez, S.
2017-09-01
This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prudencio, E.; Candel, A.; Ge, L.
2008-02-04
In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation timemore » from days to some hours.« less
Solution-adaptive finite element method in computational fracture mechanics
NASA Technical Reports Server (NTRS)
Min, J. B.; Bass, J. M.; Spradley, L. W.
1993-01-01
Some recent results obtained using solution-adaptive finite element method in linear elastic two-dimensional fracture mechanics problems are presented. The focus is on the basic issue of adaptive finite element method for validating the applications of new methodology to fracture mechanics problems by computing demonstration problems and comparing the stress intensity factors to analytical results.
Phase-space finite elements in a least-squares solution of the transport equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drumm, C.; Fan, W.; Pautz, S.
2013-07-01
The linear Boltzmann transport equation is solved using a least-squares finite element approximation in the space, angular and energy phase-space variables. The method is applied to both neutral particle transport and also to charged particle transport in the presence of an electric field, where the angular and energy derivative terms are handled with the energy/angular finite elements approximation, in a manner analogous to the way the spatial streaming term is handled. For multi-dimensional problems, a novel approach is used for the angular finite elements: mapping the surface of a unit sphere to a two-dimensional planar region and using a meshingmore » tool to generate a mesh. In this manner, much of the spatial finite-elements machinery can be easily adapted to handle the angular variable. The energy variable and the angular variable for one-dimensional problems make use of edge/beam elements, also building upon the spatial finite elements capabilities. The methods described here can make use of either continuous or discontinuous finite elements in space, angle and/or energy, with the use of continuous finite elements resulting in a smaller problem size and the use of discontinuous finite elements resulting in more accurate solutions for certain types of problems. The work described in this paper makes use of continuous finite elements, so that the resulting linear system is symmetric positive definite and can be solved with a highly efficient parallel preconditioned conjugate gradients algorithm. The phase-space finite elements capability has been built into the Sceptre code and applied to several test problems, including a simple one-dimensional problem with an analytic solution available, a two-dimensional problem with an isolated source term, showing how the method essentially eliminates ray effects encountered with discrete ordinates, and a simple one-dimensional charged-particle transport problem in the presence of an electric field. (authors)« less
Finite-Time Stabilization and Adaptive Control of Memristor-Based Delayed Neural Networks.
Wang, Leimin; Shen, Yi; Zhang, Guodong
Finite-time stability problem has been a hot topic in control and system engineering. This paper deals with the finite-time stabilization issue of memristor-based delayed neural networks (MDNNs) via two control approaches. First, in order to realize the stabilization of MDNNs in finite time, a delayed state feedback controller is proposed. Then, a novel adaptive strategy is applied to the delayed controller, and finite-time stabilization of MDNNs can also be achieved by using the adaptive control law. Some easily verified algebraic criteria are derived to ensure the stabilization of MDNNs in finite time, and the estimation of the settling time functional is given. Moreover, several finite-time stability results as our special cases for both memristor-based neural networks (MNNs) without delays and neural networks are given. Finally, three examples are provided for the illustration of the theoretical results.Finite-time stability problem has been a hot topic in control and system engineering. This paper deals with the finite-time stabilization issue of memristor-based delayed neural networks (MDNNs) via two control approaches. First, in order to realize the stabilization of MDNNs in finite time, a delayed state feedback controller is proposed. Then, a novel adaptive strategy is applied to the delayed controller, and finite-time stabilization of MDNNs can also be achieved by using the adaptive control law. Some easily verified algebraic criteria are derived to ensure the stabilization of MDNNs in finite time, and the estimation of the settling time functional is given. Moreover, several finite-time stability results as our special cases for both memristor-based neural networks (MNNs) without delays and neural networks are given. Finally, three examples are provided for the illustration of the theoretical results.
Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Self-consistent field theory simulations of polymers on arbitrary domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ouaknin, Gaddiel, E-mail: gaddielouaknin@umail.ucsb.edu; Laachi, Nabil; Delaney, Kris
2016-12-15
We introduce a framework for simulating the mesoscale self-assembly of block copolymers in arbitrary confined geometries subject to Neumann boundary conditions. We employ a hybrid finite difference/volume approach to discretize the mean-field equations on an irregular domain represented implicitly by a level-set function. The numerical treatment of the Neumann boundary conditions is sharp, i.e. it avoids an artificial smearing in the irregular domain boundary. This strategy enables the study of self-assembly in confined domains and enables the computation of physically meaningful quantities at the domain interface. In addition, we employ adaptive grids encoded with Quad-/Oc-trees in parallel to automatically refinemore » the grid where the statistical fields vary rapidly as well as at the boundary of the confined domain. This approach results in a significant reduction in the number of degrees of freedom and makes the simulations in arbitrary domains using effective boundary conditions computationally efficient in terms of both speed and memory requirement. Finally, in the case of regular periodic domains, where pseudo-spectral approaches are superior to finite differences in terms of CPU time and accuracy, we use the adaptive strategy to store chain propagators, reducing the memory footprint without loss of accuracy in computed physical observables.« less
Parallel aeroelastic computations for wing and wing-body configurations
NASA Technical Reports Server (NTRS)
Byun, Chansup
1994-01-01
The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains
NASA Astrophysics Data System (ADS)
Lopes, I. A. Rodrigues; Pires, F. M. Andrade; Reis, F. J. P.
2018-02-01
A mixed parallel strategy for the solution of homogenization-based multi-scale constitutive problems undergoing finite strains is proposed. The approach aims to reduce the computational time and memory requirements of non-linear coupled simulations that use finite element discretization at both scales (FE^2). In the first level of the algorithm, a non-conforming domain decomposition technique, based on the FETI method combined with a mortar discretization at the interface of macroscopic subdomains, is employed. A master-slave scheme, which distributes tasks by macroscopic element and adopts dynamic scheduling, is then used for each macroscopic subdomain composing the second level of the algorithm. This strategy allows the parallelization of FE^2 simulations in computers with either shared memory or distributed memory architectures. The proposed strategy preserves the quadratic rates of asymptotic convergence that characterize the Newton-Raphson scheme. Several examples are presented to demonstrate the robustness and efficiency of the proposed parallel strategy.
Parallelized implicit propagators for the finite-difference Schrödinger equation
NASA Astrophysics Data System (ADS)
Parker, Jonathan; Taylor, K. T.
1995-08-01
We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
Adaptive Mesh Refinement for Microelectronic Device Design
NASA Technical Reports Server (NTRS)
Cwik, Tom; Lou, John; Norton, Charles
1999-01-01
Finite element and finite volume methods are used in a variety of design simulations when it is necessary to compute fields throughout regions that contain varying materials or geometry. Convergence of the simulation can be assessed by uniformly increasing the mesh density until an observable quantity stabilizes. Depending on the electrical size of the problem, uniform refinement of the mesh may be computationally infeasible due to memory limitations. Similarly, depending on the geometric complexity of the object being modeled, uniform refinement can be inefficient since regions that do not need refinement add to the computational expense. In either case, convergence to the correct (measured) solution is not guaranteed. Adaptive mesh refinement methods attempt to selectively refine the region of the mesh that is estimated to contain proportionally higher solution errors. The refinement may be obtained by decreasing the element size (h-refinement), by increasing the order of the element (p-refinement) or by a combination of the two (h-p refinement). A successful adaptive strategy refines the mesh to produce an accurate solution measured against the correct fields without undue computational expense. This is accomplished by the use of a) reliable a posteriori error estimates, b) hierarchal elements, and c) automatic adaptive mesh generation. Adaptive methods are also useful when problems with multi-scale field variations are encountered. These occur in active electronic devices that have thin doped layers and also when mixed physics is used in the calculation. The mesh needs to be fine at and near the thin layer to capture rapid field or charge variations, but can coarsen away from these layers where field variations smoothen and charge densities are uniform. This poster will present an adaptive mesh refinement package that runs on parallel computers and is applied to specific microelectronic device simulations. Passive sensors that operate in the infrared portion of the spectrum as well as active device simulations that model charge transport and Maxwell's equations will be presented.
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, IIftekhar; Husain, Tausif; Uddin, Md Wasi
2015-08-24
This paper presents a nonlinear analytical model of a novel double-sided flux concentrating Transverse Flux Machine (TFM) based on the Magnetic Equivalent Circuit (MEC) model. The analytical model uses a series-parallel combination of flux tubes to predict the flux paths through different parts of the machine including air gaps, permanent magnets, stator, and rotor. The two-dimensional MEC model approximates the complex three-dimensional flux paths of the TFM and includes the effects of magnetic saturation. The model is capable of adapting to any geometry that makes it a good alternative for evaluating prospective designs of TFM compared to finite element solversmore » that are numerically intensive and require more computation time. A single-phase, 1-kW, 400-rpm machine is analytically modeled, and its resulting flux distribution, no-load EMF, and torque are verified with finite element analysis. The results are found to be in agreement, with less than 5% error, while reducing the computation time by 25 times.« less
Analytical Modeling of a Novel Transverse Flux Machine for Direct Drive Wind Turbine Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, IIftekhar; Husain, Tausif; Uddin, Md Wasi
2015-09-02
This paper presents a nonlinear analytical model of a novel double sided flux concentrating Transverse Flux Machine (TFM) based on the Magnetic Equivalent Circuit (MEC) model. The analytical model uses a series-parallel combination of flux tubes to predict the flux paths through different parts of the machine including air gaps, permanent magnets (PM), stator, and rotor. The two-dimensional MEC model approximates the complex three-dimensional flux paths of the TFM and includes the effects of magnetic saturation. The model is capable of adapting to any geometry which makes it a good alternative for evaluating prospective designs of TFM as compared tomore » finite element solvers which are numerically intensive and require more computation time. A single phase, 1 kW, 400 rpm machine is analytically modeled and its resulting flux distribution, no-load EMF and torque, verified with Finite Element Analysis (FEA). The results are found to be in agreement with less than 5% error, while reducing the computation time by 25 times.« less
NASA Astrophysics Data System (ADS)
Barajas-Solano, D. A.; Tartakovsky, A. M.
2017-12-01
We present a multiresolution method for the numerical simulation of flow and reactive transport in porous, heterogeneous media, based on the hybrid Multiscale Finite Volume (h-MsFV) algorithm. The h-MsFV algorithm allows us to couple high-resolution (fine scale) flow and transport models with lower resolution (coarse) models to locally refine both spatial resolution and transport models. The fine scale problem is decomposed into various "local'' problems solved independently in parallel and coordinated via a "global'' problem. This global problem is then coupled with the coarse model to strictly ensure domain-wide coarse-scale mass conservation. The proposed method provides an alternative to adaptive mesh refinement (AMR), due to its capacity to rapidly refine spatial resolution beyond what's possible with state-of-the-art AMR techniques, and the capability to locally swap transport models. We illustrate our method by applying it to groundwater flow and reactive transport of multiple species.
Efficient partitioning and assignment on programs for multiprocessor execution
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1993-01-01
The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
NASA Technical Reports Server (NTRS)
Wang, P.; Li, P.
1998-01-01
A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.
Transient Finite Element Computations on a Variable Transputer System
NASA Technical Reports Server (NTRS)
Smolinski, Patrick J.; Lapczyk, Ireneusz
1993-01-01
A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
Using the GeoFEST Faulted Region Simulation System
NASA Technical Reports Server (NTRS)
Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy
2004-01-01
GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Wang, Yujuan; Song, Yongduan; Ren, Wei
2017-07-06
This paper presents a distributed adaptive finite-time control solution to the formation-containment problem for multiple networked systems with uncertain nonlinear dynamics and directed communication constraints. By integrating the special topology feature of the new constructed symmetrical matrix, the technical difficulty in finite-time formation-containment control arising from the asymmetrical Laplacian matrix under single-way directed communication is circumvented. Based upon fractional power feedback of the local error, an adaptive distributed control scheme is established to drive the leaders into the prespecified formation configuration in finite time. Meanwhile, a distributed adaptive control scheme, independent of the unavailable inputs of the leaders, is designed to keep the followers within a bounded distance from the moving leaders and then to make the followers enter the convex hull shaped by the formation of the leaders in finite time. The effectiveness of the proposed control scheme is confirmed by the simulation.
Dendritic Growth with Fluid Flow for Pure Materials
NASA Technical Reports Server (NTRS)
Jeong, Jun-Ho; Dantzig, Jonathan A.; Goldenfeld, Nigel
2003-01-01
We have developed a three-dimensional, adaptive, parallel finite element code to examine solidification of pure materials under conditions of forced flow. We have examined the effect of undercooling, surface tension anisotropy and imposed flow velocity on the growth. The flow significantly alters the growth process, producing dendrites that grow faster, and with greater tip curvature, into the flow. The selection constant decreases slightly with flow velocity in our calculations. The results of the calculations agree well with the transport solution of Saville and Beaghton at high undercooling and high anisotropy. At low undercooling, significant deviations are found. We attribute this difference to the influence of other parts of the dendrite, removed from the tip, on the flow field.
Progress on the development of FullWave, a Hot and Cold Plasma Parallel Full Wave Code
NASA Astrophysics Data System (ADS)
Spencer, J. Andrew; Svidzinski, Vladimir; Zhao, Liangji; Kim, Jin-Soo
2017-10-01
FullWave is being developed at FAR-TECH, Inc. to simulate RF waves in hot inhomogeneous magnetized plasmas without making small orbit approximations. FullWave is based on a meshless formulation in configuration space on non-uniform clouds of computational points (CCP) adapted to better resolve plasma resonances, antenna structures and complex boundaries. The linear frequency domain wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. The details of FullWave and some preliminary results will be presented, including: 1) a monitor function based on analytic solutions of the cold-plasma dispersion relation; 2) an adaptive CCP based on the monitor function; 3) construction of the finite differences for approximation of derivatives on adaptive CCP; 4) results of 2-D full wave simulations in the cold plasma model in tokamak geometry using the formulated approach for ECRH, ICRH and Lower Hybrid range of frequencies. Work is supported by the U.S. DOE SBIR program.
NASA Astrophysics Data System (ADS)
Wu, Yun-jie; Li, Guo-fei
2018-01-01
Based on sliding mode extended state observer (SMESO) technique, an adaptive disturbance compensation finite control set optimal control (FCS-OC) strategy is proposed for permanent magnet synchronous motor (PMSM) system driven by voltage source inverter (VSI). So as to improve robustness of finite control set optimal control strategy, a SMESO is proposed to estimate the output-effect disturbance. The estimated value is fed back to finite control set optimal controller for implementing disturbance compensation. It is indicated through theoretical analysis that the designed SMESO could converge in finite time. The simulation results illustrate that the proposed adaptive disturbance compensation FCS-OC possesses better dynamical response behavior in the presence of disturbance.
NASA Astrophysics Data System (ADS)
Farengo, R.; Guzdar, P. N.; Lee, Y. C.
1989-08-01
The effect of finite parallel wavenumber and electron temperature gradients on the lower hybrid drift instability is studied in the parameter regime corresponding to the TRX-2 device [Fusion Technol. 9, 48 (1986)]. Perturbations in the electrostatic potential and all three components of the vector potential are considered and finite beta electron orbit modifications are included. The electron temperature gradient decreases the growth rate of the instability but, for kz=0, unstable modes exist for ηe(=T'en0/Ten0)>6. Since finite kz effects completely stabilize the mode at small values of kz/ky(≂5×10-3), magnetic shear could be responsible for stabilizing the lower hybrid drift instability in field-reversed configurations.
Optimal mapping of irregular finite element domains to parallel processors
NASA Technical Reports Server (NTRS)
Flower, J.; Otto, S.; Salama, M.
1987-01-01
Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.
1991-01-01
A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-11-01
The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
Mofid, Omid; Mobayen, Saleh
2018-01-01
Adaptive control methods are developed for stability and tracking control of flight systems in the presence of parametric uncertainties. This paper offers a design technique of adaptive sliding mode control (ASMC) for finite-time stabilization of unmanned aerial vehicle (UAV) systems with parametric uncertainties. Applying the Lyapunov stability concept and finite-time convergence idea, the recommended control method guarantees that the states of the quad-rotor UAV are converged to the origin with a finite-time convergence rate. Furthermore, an adaptive-tuning scheme is advised to guesstimate the unknown parameters of the quad-rotor UAV at any moment. Finally, simulation results are presented to exhibit the helpfulness of the offered technique compared to the previous methods. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Li, J; Guo, L-X; Zeng, H; Han, X-B
2009-06-01
A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
1983-03-01
AN ANALYSIS OF A FINITE ELEMENT METHOD FOR CONVECTION- DIFFUSION PROBLEMS PART II: A POSTERIORI ERROR ESTIMATES AND ADAPTIVITY by W. G. Szymczak Y 6a...PERIOD COVERED AN ANALYSIS OF A FINITE ELEMENT METHOD FOR final life of the contract CONVECTION- DIFFUSION PROBLEM S. Part II: A POSTERIORI ERROR ...Element Method for Convection- Diffusion Problems. Part II: A Posteriori Error Estimates and Adaptivity W. G. Szvmczak and I. Babu~ka# Laboratory for
NASA Astrophysics Data System (ADS)
Cheng, Lin; Yang, Yongqing; Li, Li; Sui, Xin
2018-06-01
This paper studies the finite-time hybrid projective synchronization of the drive-response complex networks. In the model, general transmission delays and distributed delays are also considered. By designing the adaptive intermittent controllers, the response network can achieve hybrid projective synchronization with the drive system in finite time. Based on finite-time stability theory and several differential inequalities, some simple finite-time hybrid projective synchronization criteria are derived. Two numerical examples are given to illustrate the effectiveness of the proposed method.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
NASA Technical Reports Server (NTRS)
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
student, he developed a parallel spectral finite element method for treating the interaction of large mechanics of fluids, structures, and their interaction|Spectral finite-element methods for time-dependent
Lautenschlager, Stephan
2014-06-22
Therizinosaurs are a group of herbivorous theropod dinosaurs from the Cretaceous of North America and Asia, best known for their iconically large and elongate manual claws. However, among Therizinosauria, ungual morphology is highly variable, reflecting a general trend found in derived theropod dinosaurs (Maniraptoriformes). A combined approach of shape analysis to characterize changes in manual ungual morphology across theropods and finite-element analysis to assess the biomechanical properties of different ungual shapes in therizinosaurs reveals a functional diversity related to ungual morphology. While some therizinosaur taxa used their claws in a generalist fashion, other taxa were functionally adapted to use the claws as grasping hooks during foraging. Results further indicate that maniraptoriform dinosaurs deviated from the plesiomorphic theropod ungual morphology resulting in increased functional diversity. This trend parallels modifications of the cranial skeleton in derived theropods in response to dietary adaptation, suggesting that dietary diversification was a major driver for morphological and functional disparity in theropod evolution.
NASA Workshop on Computational Structural Mechanics 1987, part 1
NASA Technical Reports Server (NTRS)
Sykes, Nancy P. (Editor)
1989-01-01
Topics in Computational Structural Mechanics (CSM) are reviewed. CSM parallel structural methods, a transputer finite element solver, architectures for multiprocessor computers, and parallel eigenvalue extraction are among the topics discussed.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Adaptive finite element method for turbulent flow near a propeller
NASA Astrophysics Data System (ADS)
Pelletier, Dominique; Ilinca, Florin; Hetu, Jean-Francois
1994-11-01
This paper presents an adaptive finite element method based on remeshing to solve incompressible turbulent free shear flow near a propeller. Solutions are obtained in primitive variables using a highly accurate finite element approximation on unstructured grids. Turbulence is modeled by a mixing length formulation. Two general purpose error estimators, which take into account swirl and the variation of the eddy viscosity, are presented and applied to the turbulent wake of a propeller. Predictions compare well with experimental measurements. The proposed adaptive scheme is robust, reliable and cost effective.
A parallel finite-difference method for computational aerodynamics
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
Efficient Implementation of Multigrid Solvers on Message-Passing Parrallel Systems
NASA Technical Reports Server (NTRS)
Lou, John
1994-01-01
We discuss our implementation strategies for finite difference multigrid partial differential equation (PDE) solvers on message-passing systems. Our target parallel architecture is Intel parallel computers: the Delta and Paragon system.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
NASA Astrophysics Data System (ADS)
Chen, M.; Wei, S.
2016-12-01
The serious damage of Mexico City caused by the 1985 Michoacan earthquake 400 km away indicates that urban areas may be affected by remote earthquakes. To asses earthquake risk of urban areas imposed by distant earthquakes, we developed a hybrid Frequency Wavenumber (FK) and Finite Difference (FD) code implemented with MPI, since the computation of seismic wave propagation from a distant earthquake using a single numerical method (e.g. Finite Difference, Finite Element or Spectral Element) is very expensive. In our approach, we compute the incident wave field (ud) at the boundaries of the excitation box, which surrounding the local structure, using a paralleled FK method (Zhu and Rivera, 2002), and compute the total wave field (u) within the excitation box using a parallelled 2D FD method. We apply perfectly matched layer (PML) absorbing condition to the diffracted wave field (u-ud). Compared to previous Generalized Ray Theory and Finite Difference (Wen and Helmberger, 1998), Frequency Wavenumber and Spectral Element (Tong et al., 2014), and Direct Solution Method and Spectral Element hybrid method (Monteiller et al., 2013), our absorbing boundary condition dramatically suppress the numerical noise. The MPI implementation of our method can greatly speed up the calculation. Besides, our hybrid method also has a potential use in high resolution array imaging similar to Tong et al. (2014).
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit
NASA Technical Reports Server (NTRS)
MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles
1999-01-01
In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids
NASA Technical Reports Server (NTRS)
Lemke, Max; Witsch, Kristian; Quinlan, Daniel
1993-01-01
Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
The influence of the self-consistent mode structure on the Coriolis pinch effect
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peeters, A. G.; Camenen, Y.; Casson, F. J.
This paper discusses the effect of the mode structure on the Coriolis pinch effect [A. G. Peeters, C. Angioni, and D. Strintzi, Phys. Rev. Lett. 98, 265003 (2007)]. It is shown that the Coriolis drift effect can be compensated for by a finite parallel wave vector, resulting in a reduced momentum pinch velocity. Gyrokinetic simulations in full toroidal geometry reveal that parallel dynamics effectively removes the Coriolis pinch for the case of adiabatic electrons, while the compensation due to the parallel dynamics is incomplete for the case of kinetic electrons, resulting in a finite pinch velocity. The finite flux inmore » the case of kinetic electrons is interpreted to be related to the electron trapping, which prevents a strong asymmetry in the electrostatic potential with respect to the low field side position. The physics picture developed here leads to the discovery and explanation of two unexpected effects: First the pinch velocity scales with the trapped particle fraction (root of the inverse aspect ratio), and second there is no strong collisionality dependence. The latter is related to the role of the trapped electrons, which retain some symmetry in the eigenmode, but play no role in the perturbed parallel velocity.« less
A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis
NASA Astrophysics Data System (ADS)
Jokhio, G. A.; Izzuddin, B. A.
2015-05-01
This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
Parallel adaptive wavelet collocation method for PDEs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)
1999-01-01
This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
Binary tree eigen solver in finite element analysis
NASA Technical Reports Server (NTRS)
Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.
1993-01-01
This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Besse, Nicolas; Latu, Guillaume; Ghizzo, Alain
In this paper we present a new method for the numerical solution of the relativistic Vlasov-Maxwell system on a phase-space grid using an adaptive semi-Lagrangian method. The adaptivity is performed through a wavelet multiresolution analysis, which gives a powerful and natural refinement criterion based on the local measurement of the approximation error and regularity of the distribution function. Therefore, the multiscale expansion of the distribution function allows to get a sparse representation of the data and thus save memory space and CPU time. We apply this numerical scheme to reduced Vlasov-Maxwell systems arising in laser-plasma physics. Interaction of relativistically strongmore » laser pulses with overdense plasma slabs is investigated. These Vlasov simulations revealed a rich variety of phenomena associated with the fast particle dynamics induced by electromagnetic waves as electron trapping, particle acceleration, and electron plasma wavebreaking. However, the wavelet based adaptive method that we developed here, does not yield significant improvements compared to Vlasov solvers on a uniform mesh due to the substantial overhead that the method introduces. Nonetheless they might be a first step towards more efficient adaptive solvers based on different ideas for the grid refinement or on a more efficient implementation. Here the Vlasov simulations are performed in a two-dimensional phase-space where the development of thin filaments, strongly amplified by relativistic effects requires an important increase of the total number of points of the phase-space grid as they get finer as time goes on. The adaptive method could be more useful in cases where these thin filaments that need to be resolved are a very small fraction of the hyper-volume, which arises in higher dimensions because of the surface-to-volume scaling and the essentially one-dimensional structure of the filaments. Moreover, the main way to improve the efficiency of the adaptive method is to increase the local character in phase-space of the numerical scheme, by considering multiscale reconstruction with more compact support and by replacing the semi-Lagrangian method with more local - in space - numerical scheme as compact finite difference schemes, discontinuous-Galerkin method or finite element residual schemes which are well suited for parallel domain decomposition techniques.« less
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements
NASA Astrophysics Data System (ADS)
Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.
2000-11-01
In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
Guo, L-X; Li, J; Zeng, H
2009-11-01
We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190
2015-03-15
We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
Quality assessment and control of finite element solutions
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.; Babuska, Ivo
1987-01-01
Status and some recent developments in the techniques for assessing the reliability of finite element solutions are summarized. Discussion focuses on a number of aspects including: the major types of errors in the finite element solutions; techniques used for a posteriori error estimation and the reliability of these estimators; the feedback and adaptive strategies for improving the finite element solutions; and postprocessing approaches used for improving the accuracy of stresses and other important engineering data. Also, future directions for research needed to make error estimation and adaptive movement practical are identified.
A comparison of parallel and diverging screw angles in the stability of locked plate constructs.
Wähnert, D; Windolf, M; Brianza, S; Rothstock, S; Radtke, R; Brighenti, V; Schwieger, K
2011-09-01
We investigated the static and cyclical strength of parallel and angulated locking plate screws using rigid polyurethane foam (0.32 g/cm(3)) and bovine cancellous bone blocks. Custom-made stainless steel plates with two conically threaded screw holes with different angulations (parallel, 10° and 20° divergent) and 5 mm self-tapping locking screws underwent pull-out and cyclical pull and bending tests. The bovine cancellous blocks were only subjected to static pull-out testing. We also performed finite element analysis for the static pull-out test of the parallel and 20° configurations. In both the foam model and the bovine cancellous bone we found the significantly highest pull-out force for the parallel constructs. In the finite element analysis there was a 47% more damage in the 20° divergent constructs than in the parallel configuration. Under cyclical loading, the mean number of cycles to failure was significantly higher for the parallel group, followed by the 10° and 20° divergent configurations. In our laboratory setting we clearly showed the biomechanical disadvantage of a diverging locking screw angle under static and cyclical loading.
Finite-time master-slave synchronization and parameter identification for uncertain Lurie systems.
Wang, Tianbo; Zhao, Shouwei; Zhou, Wuneng; Yu, Weiqin
2014-07-01
This paper investigates the finite-time master-slave synchronization and parameter identification problem for uncertain Lurie systems based on the finite-time stability theory and the adaptive control method. The finite-time master-slave synchronization means that the state of a slave system follows with that of a master system in finite time, which is more reasonable than the asymptotical synchronization in applications. The uncertainties include the unknown parameters and noise disturbances. An adaptive controller and update laws which ensures the synchronization and parameter identification to be realized in finite time are constructed. Finally, two numerical examples are given to show the effectiveness of the proposed method. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
A High Order, Locally-Adaptive Method for the Navier-Stokes Equations
NASA Astrophysics Data System (ADS)
Chan, Daniel
1998-11-01
I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.
Three dimensional modelling of earthquake rupture cycles on frictional faults
NASA Astrophysics Data System (ADS)
Simpson, Guy; May, Dave
2017-04-01
We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
A 3D staggered-grid finite difference scheme for poroelastic wave equation
NASA Astrophysics Data System (ADS)
Zhang, Yijie; Gao, Jinghuai
2014-10-01
Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Parallelization of implicit finite difference schemes in computational fluid dynamics
NASA Technical Reports Server (NTRS)
Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel
1990-01-01
Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.
Genomics of parallel adaptation at two timescales in Drosophila
Begun, David J.
2017-01-01
Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus. PMID:28968391
System software for the finite element machine
NASA Technical Reports Server (NTRS)
Crockett, T. W.; Knott, J. D.
1985-01-01
The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested.
Finite Element Analysis of Adaptive-Stiffening and Shape-Control SMA Hybrid Composites
NASA Technical Reports Server (NTRS)
Gao, Xiujie; Burton, Deborah; Turner, Travis L.; Brinson, Catherine
2005-01-01
Shape memory alloy hybrid composites with adaptive-stiffening or morphing functions are simulated using finite element analysis. The composite structure is a laminated fiber-polymer composite beam with embedded SMA ribbons at various positions with respect to the neutral axis of the beam. Adaptive stiffening or morphing is activated via selective resistance heating of the SMA ribbons or uniform thermal loads on the beam. The thermomechanical behavior of these composites was simulated in ABAQUS using user-defined SMA elements. The examples demonstrate the usefulness of the methods for the design and simulation of SMA hybrid composites. Keywords: shape memory alloys, Nitinol, ABAQUS, finite element analysis, post-buckling control, shape control, deflection control, adaptive stiffening, morphing, constitutive modeling, user element
MARE2DEM: a 2-D inversion code for controlled-source electromagnetic and magnetotelluric data
NASA Astrophysics Data System (ADS)
Key, Kerry
2016-10-01
This work presents MARE2DEM, a freely available code for 2-D anisotropic inversion of magnetotelluric (MT) data and frequency-domain controlled-source electromagnetic (CSEM) data from onshore and offshore surveys. MARE2DEM parametrizes the inverse model using a grid of arbitrarily shaped polygons, where unstructured triangular or quadrilateral grids are typically used due to their ease of construction. Unstructured grids provide significantly more geometric flexibility and parameter efficiency than the structured rectangular grids commonly used by most other inversion codes. Transmitter and receiver components located on topographic slopes can be tilted parallel to the boundary so that the simulated electromagnetic fields accurately reproduce the real survey geometry. The forward solution is implemented with a goal-oriented adaptive finite-element method that automatically generates and refines unstructured triangular element grids that conform to the inversion parameter grid, ensuring accurate responses as the model conductivity changes. This dual-grid approach is significantly more efficient than the conventional use of a single grid for both the forward and inverse meshes since the more detailed finite-element meshes required for accurate responses do not increase the memory requirements of the inverse problem. Forward solutions are computed in parallel with a highly efficient scaling by partitioning the data into smaller independent modeling tasks consisting of subsets of the input frequencies, transmitters and receivers. Non-linear inversion is carried out with a new Occam inversion approach that requires fewer forward calls. Dense matrix operations are optimized for memory and parallel scalability using the ScaLAPACK parallel library. Free parameters can be bounded using a new non-linear transformation that leaves the transformed parameters nearly the same as the original parameters within the bounds, thereby reducing non-linear smoothing effects. Data balancing normalization weights for the joint inversion of two or more data sets encourages the inversion to fit each data type equally well. A synthetic joint inversion of marine CSEM and MT data illustrates the algorithm's performance and parallel scaling on up to 480 processing cores. CSEM inversion of data from the Middle America Trench offshore Nicaragua demonstrates a real world application. The source code and MATLAB interface tools are freely available at http://mare2dem.ucsd.edu.
Aquilina, Peter; Chamoli, Uphar; Parr, William C H; Clausen, Philip D; Wroe, Stephen
2013-06-01
The most stable pattern of internal fixation for fractures of the mandibular condyle is a matter for ongoing discussion. In this study we investigated the stability of three commonly used patterns of plate fixation, and constructed finite element models of a simulated mandibular condylar fracture. The completed models were heterogeneous in the distribution of bony material properties, contained about 1.2 million elements, and incorporated simulated jaw-adducting musculature. Models were run assuming linear elasticity and isotropic material properties for bone. This model was considerably larger and more complex than previous finite element models that have been used to analyse the biomechanical behaviour of differing plating techniques. The use of two parallel 2.0 titanium miniplates gave a more stable configuration with lower mean element stresses and displacements over the use of a single miniplate. In addition, a parallel orientation of two miniplates resulted in lower stresses and displacements than did the use of two miniplates in an offset pattern. The use of two parallel titanium plates resulted in a superior biomechanical result as defined by mean element stresses and relative movement between the fractured fragments in these finite element models. Copyright © 2012 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
Adaptive Finite Element Methods for Continuum Damage Modeling
NASA Technical Reports Server (NTRS)
Min, J. B.; Tworzydlo, W. W.; Xiques, K. E.
1995-01-01
The paper presents an application of adaptive finite element methods to the modeling of low-cycle continuum damage and life prediction of high-temperature components. The major objective is to provide automated and accurate modeling of damaged zones through adaptive mesh refinement and adaptive time-stepping methods. The damage modeling methodology is implemented in an usual way by embedding damage evolution in the transient nonlinear solution of elasto-viscoplastic deformation problems. This nonlinear boundary-value problem is discretized by adaptive finite element methods. The automated h-adaptive mesh refinements are driven by error indicators, based on selected principal variables in the problem (stresses, non-elastic strains, damage, etc.). In the time domain, adaptive time-stepping is used, combined with a predictor-corrector time marching algorithm. The time selection is controlled by required time accuracy. In order to take into account strong temperature dependency of material parameters, the nonlinear structural solution a coupled with thermal analyses (one-way coupling). Several test examples illustrate the importance and benefits of adaptive mesh refinements in accurate prediction of damage levels and failure time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Turner, C. David; Kotulski, Joseph Daniel; Pasik, Michael Francis
This report investigates the feasibility of applying Adaptive Mesh Refinement (AMR) techniques to a vector finite element formulation for the wave equation in three dimensions. Possible error estimators are considered first. Next, approaches for refining tetrahedral elements are reviewed. AMR capabilities within the Nevada framework are then evaluated. We summarize our conclusions on the feasibility of AMR for time-domain vector finite elements and identify a path forward.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Halem, Milton (Technical Monitor)
2000-01-01
We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Design, development and use of the finite element machine
NASA Technical Reports Server (NTRS)
Adams, L. M.; Voigt, R. C.
1983-01-01
Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained.
Arbitrary-level hanging nodes for adaptive hphp-FEM approximations in 3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pavel Kus; Pavel Solin; David Andrs
2014-11-01
In this paper we discuss constrained approximation with arbitrary-level hanging nodes in adaptive higher-order finite element methods (hphp-FEM) for three-dimensional problems. This technique enables using highly irregular meshes, and it greatly simplifies the design of adaptive algorithms as it prevents refinements from propagating recursively through the finite element mesh. The technique makes it possible to design efficient adaptive algorithms for purely hexahedral meshes. We present a detailed mathematical description of the method and illustrate it with numerical examples.
A multilevel correction adaptive finite element method for Kohn-Sham equation
NASA Astrophysics Data System (ADS)
Hu, Guanghui; Xie, Hehu; Xu, Fei
2018-02-01
In this paper, an adaptive finite element method is proposed for solving Kohn-Sham equation with the multilevel correction technique. In the method, the Kohn-Sham equation is solved on a fixed and appropriately coarse mesh with the finite element method in which the finite element space is kept improving by solving the derived boundary value problems on a series of adaptively and successively refined meshes. A main feature of the method is that solving large scale Kohn-Sham system is avoided effectively, and solving the derived boundary value problems can be handled efficiently by classical methods such as the multigrid method. Hence, the significant acceleration can be obtained on solving Kohn-Sham equation with the proposed multilevel correction technique. The performance of the method is examined by a variety of numerical experiments.
Exploiting parallel computing with limited program changes using a network of microcomputers
NASA Technical Reports Server (NTRS)
Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.
1985-01-01
Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.
Kumar, Neelesh
2014-10-01
Finite element analysis has been universally employed for the stress and strain analysis in lower extremity prosthetics. The socket adapter was the principal subject of interest due to its importance in deciding the knee motion range. This article focused on the static and dynamic stress analysis of the designed hybrid adapter developed by the authors. A standard mechanical design validation approach using von Mises was followed. Four materials were considered for the analysis, namely, carbon fiber, oil-filled nylon, Al-6061, and mild steel. The paper analyses the static and dynamic stress on designed hybrid adapter which incorporates features of conventional male and female socket adapters. The finite element analysis was carried out for possible different angles of knee flexion simulating static and dynamic gait situation. Research was carried out on available design of socket adapter. Mechanical design of hybrid adapter was conceptualized and a CAD model was generated using Inventor modelling software. Static and dynamic stress analysis was carried out on different materials for optimization. The finite element analysis was carried out on the software Autodesk Inventor Professional Ver. 2011. The peak value of von Mises stress occurred in the neck region of the adapter and in the lower face region at rod eye-adapter junction in static and dynamic analyses, respectively. Oil-filled nylon was found to be the best material among the four with respect to strength, weight, and cost. Research investigations on newer materials for development of improved prosthesis will immensely benefit the amputees. The study analyze the static and dynamic stress on the knee joint adapter to provide better material used for hybrid design of adapter. © The International Society for Prosthetics and Orthotics 2013.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Applications of Parallel Computation in Micro-Mechanics and Finite Element Method
NASA Technical Reports Server (NTRS)
Tan, Hui-Qian
1996-01-01
This project discusses the application of parallel computations related with respect to material analyses. Briefly speaking, we analyze some kind of material by elements computations. We call an element a cell here. A cell is divided into a number of subelements called subcells and all subcells in a cell have the identical structure. The detailed structure will be given later in this paper. It is obvious that the problem is "well-structured". SIMD machine would be a better choice. In this paper we try to look into the potentials of SIMD machine in dealing with finite element computation by developing appropriate algorithms on MasPar, a SIMD parallel machine. In section 2, the architecture of MasPar will be discussed. A brief review of the parallel programming language MPL also is given in that section. In section 3, some general parallel algorithms which might be useful to the project will be proposed. And, combining with the algorithms, some features of MPL will be discussed in more detail. In section 4, the computational structure of cell/subcell model will be given. The idea of designing the parallel algorithm for the model will be demonstrated. Finally in section 5, a summary will be given.
Convergence of an hp-Adaptive Finite Element Strategy in Two and Three Space-Dimensions
NASA Astrophysics Data System (ADS)
Bürg, Markus; Dörfler, Willy
2010-09-01
We show convergence of an automatic hp-adaptive refinement strategy for the finite element method on the elliptic boundary value problem. The strategy is a generalization of a refinement strategy proposed for one-dimensional situations to problems in two and three space-dimensions.
NASA Technical Reports Server (NTRS)
Stapleton, Scott; Gries, Thomas; Waas, Anthony M.; Pineda, Evan J.
2014-01-01
Enhanced finite elements are elements with an embedded analytical solution that can capture detailed local fields, enabling more efficient, mesh independent finite element analysis. The shape functions are determined based on the analytical model rather than prescribed. This method was applied to adhesively bonded joints to model joint behavior with one element through the thickness. This study demonstrates two methods of maintaining the fidelity of such elements during adhesive non-linearity and cracking without increasing the mesh needed for an accurate solution. The first method uses adaptive shape functions, where the shape functions are recalculated at each load step based on the softening of the adhesive. The second method is internal mesh adaption, where cracking of the adhesive within an element is captured by further discretizing the element internally to represent the partially cracked geometry. By keeping mesh adaptations within an element, a finer mesh can be used during the analysis without affecting the global finite element model mesh. Examples are shown which highlight when each method is most effective in reducing the number of elements needed to capture adhesive nonlinearity and cracking. These methods are validated against analogous finite element models utilizing cohesive zone elements.
NASA Technical Reports Server (NTRS)
Noor, Ahmed K. (Editor)
1986-01-01
The papers contained in this volume provide an overview of the advances made in a number of aspects of computational mechanics, identify some of the anticipated industry needs in this area, discuss the opportunities provided by new hardware and parallel algorithms, and outline some of the current government programs in computational mechanics. Papers are included on advances and trends in parallel algorithms, supercomputers for engineering analysis, material modeling in nonlinear finite-element analysis, the Navier-Stokes computer, and future finite-element software systems.
An Element-Based Concurrent Partitioner for Unstructured Finite Element Meshes
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
A concurrent partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The partitioner uses an element-based partitioning strategy. Its main advantage over the more conventional node-based partitioning strategy is its modular programming approach to the development of parallel applications. The partitioner first partitions element centroids using a recursive inertial bisection algorithm. Elements and nodes then migrate according to the partitioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a non-scalable implementation which is a straightforward parallelization of a sequential partitioner.
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
2011-09-01
optimized building blocks such as a parallelized tri-diagonal linear solver (used in the “implicit finite differences ” and split-step Pade PE models...and Ding Lee. “A finite - difference treatment of interface conditions for the parabolic wave equation: The horizontal interface.” The Journal of the...Acoustical Society of America, 71(4):855, 1982. 3. Ding Lee and Suzanne T. McDaniel. “A finite - difference treatment of interface conditions for
NASA Astrophysics Data System (ADS)
Roche-Lima, Abiel; Thulasiram, Ruppa K.
2012-02-01
Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Finite element methodology for integrated flow-thermal-structural analysis
NASA Technical Reports Server (NTRS)
Thornton, Earl A.; Ramakrishnan, R.; Vemaganti, G. R.
1988-01-01
Papers entitled, An Adaptive Finite Element Procedure for Compressible Flows and Strong Viscous-Inviscid Interactions, and An Adaptive Remeshing Method for Finite Element Thermal Analysis, were presented at the June 27 to 29, 1988, meeting of the AIAA Thermophysics, Plasma Dynamics and Lasers Conference, San Antonio, Texas. The papers describe research work supported under NASA/Langley Research Grant NsG-1321, and are submitted in fulfillment of the progress report requirement on the grant for the period ending February 29, 1988.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1999-01-01
The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
A curved ultrasonic actuator optimized for spherical motors: design and experiments.
Leroy, Edouard; Lozada, José; Hafez, Moustapha
2014-08-01
Multi-degree-of-freedom angular actuators are commonly used in numerous mechatronic areas such as omnidirectional robots, robot articulations or inertially stabilized platforms. The conventional method to design these devices consists in placing multiple actuators in parallel or series using gimbals which are bulky and difficult to miniaturize. Motors using a spherical rotor are interesting for miniature multidegree-of-freedom actuators. In this paper, a new actuator is proposed. It is based on a curved piezoelectric element which has its inner contact surface adapted to the diameter of the rotor. This adaptation allows to build spherical motors with a fully constrained rotor and without a need for additional guiding system. The work presents a design methodology based on modal finite element analysis. A methodology for mode selection is proposed and a sensitivity analysis of the final geometry to uncertainties and added masses is discussed. Finally, experimental results that validate the actuator concept on a single degree-of-freedom ultrasonic motor set-up are presented. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yang, Haijian; Sun, Shuyu; Yang, Chao
2017-03-01
Most existing methods for solving two-phase flow problems in porous media do not take the physically feasible saturation fractions between 0 and 1 into account, which often destroys the numerical accuracy and physical interpretability of the simulation. To calculate the solution without the loss of this basic requirement, we introduce a variational inequality formulation of the saturation equilibrium with a box inequality constraint, and use a conservative finite element method for the spatial discretization and a backward differentiation formula with adaptive time stepping for the temporal integration. The resulting variational inequality system at each time step is solved by using a semismooth Newton algorithm. To accelerate the Newton convergence and improve the robustness, we employ a family of adaptive nonlinear elimination methods as a nonlinear preconditioner. Some numerical results are presented to demonstrate the robustness and efficiency of the proposed algorithm. A comparison is also included to show the superiority of the proposed fully implicit approach over the classical IMplicit Pressure-Explicit Saturation (IMPES) method in terms of the time step size and the total execution time measured on a parallel computer.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Adaptive mixed finite element methods for Darcy flow in fractured porous media
NASA Astrophysics Data System (ADS)
Chen, Huangxin; Salama, Amgad; Sun, Shuyu
2016-10-01
In this paper, we propose adaptive mixed finite element methods for simulating the single-phase Darcy flow in two-dimensional fractured porous media. The reduced model that we use for the simulation is a discrete fracture model coupling Darcy flows in the matrix and the fractures, and the fractures are modeled by one-dimensional entities. The Raviart-Thomas mixed finite element methods are utilized for the solution of the coupled Darcy flows in the matrix and the fractures. In order to improve the efficiency of the simulation, we use adaptive mixed finite element methods based on novel residual-based a posteriori error estimators. In addition, we develop an efficient upscaling algorithm to compute the effective permeability of the fractured porous media. Several interesting examples of Darcy flow in the fractured porous media are presented to demonstrate the robustness of the algorithm.
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction
NASA Technical Reports Server (NTRS)
Padovan, Joseph; Krishna, Lala; Gute, Douglas
1997-01-01
Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
NASA Technical Reports Server (NTRS)
Frank, Andreas O.; Twombly, I. Alexander; Barth, Timothy J.; Smith, Jeffrey D.; Dalton, Bonnie P. (Technical Monitor)
2001-01-01
We have applied the linear elastic finite element method to compute haptic force feedback and domain deformations of soft tissue models for use in virtual reality simulators. Our results show that, for virtual object models of high-resolution 3D data (>10,000 nodes), haptic real time computations (>500 Hz) are not currently possible using traditional methods. Current research efforts are focused in the following areas: 1) efficient implementation of fully adaptive multi-resolution methods and 2) multi-resolution methods with specialized basis functions to capture the singularity at the haptic interface (point loading). To achieve real time computations, we propose parallel processing of a Jacobi preconditioned conjugate gradient method applied to a reduced system of equations resulting from surface domain decomposition. This can effectively be achieved using reconfigurable computing systems such as field programmable gate arrays (FPGA), thereby providing a flexible solution that allows for new FPGA implementations as improved algorithms become available. The resulting soft tissue simulation system would meet NASA Virtual Glovebox requirements and, at the same time, provide a generalized simulation engine for any immersive environment application, such as biomedical/surgical procedures or interactive scientific applications.
TransCut: interactive rendering of translucent cutouts.
Li, Dongping; Sun, Xin; Ren, Zhong; Lin, Stephen; Tong, Yiying; Guo, Baining; Zhou, Kun
2013-03-01
We present TransCut, a technique for interactive rendering of translucent objects undergoing fracturing and cutting operations. As the object is fractured or cut open, the user can directly examine and intuitively understand the complex translucent interior, as well as edit material properties through painting on cross sections and recombining the broken pieces—all with immediate and realistic visual feedback. This new mode of interaction with translucent volumes is made possible with two technical contributions. The first is a novel solver for the diffusion equation (DE) over a tetrahedral mesh that produces high-quality results comparable to the state-of-art finite element method (FEM) of Arbree et al. but at substantially higher speeds. This accuracy and efficiency is obtained by computing the discrete divergences of the diffusion equation and constructing the DE matrix using analytic formulas derived for linear finite elements. The second contribution is a multiresolution algorithm to significantly accelerate our DE solver while adapting to the frequent changes in topological structure of dynamic objects. The entire multiresolution DE solver is highly parallel and easily implemented on the GPU. We believe TransCut provides a novel visual effect for heterogeneous translucent objects undergoing fracturing and cutting operations.
Slices: A Scalable Partitioner for Finite Element Meshes
NASA Technical Reports Server (NTRS)
Ding, H. Q.; Ferraro, R. D.
1995-01-01
A parallel partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The element based partitioner can handle mixtures of different element types. All algorithms adopted in the partitioner are scalable, including a communication template for unpredictable incoming messages, as shown in actual timing measurements.
Dynamic grid refinement for partial differential equations on parallel computers
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
NASA Technical Reports Server (NTRS)
Larour, Eric; Schiermeier, John E.; Seroussi, Helene; Morlinghem, Mathieu
2013-01-01
In order to have the capability to use satellite data from its own missions to inform future sea-level rise projections, JPL needed a full-fledged ice-sheet/iceshelf flow model, capable of modeling the mass balance of Antarctica and Greenland into the near future. ISSM was developed with such a goal in mind, as a massively parallelized, multi-purpose finite-element framework dedicated to ice-sheet modeling. ISSM features unstructured meshes (Tria in 2D, and Penta in 3D) along with corresponding finite elements for both types of meshes. Each finite element can carry out diagnostic, prognostic, transient, thermal 3D, surface, and bed slope simulations. Anisotropic meshing enables adaptation of meshes to a certain metric, and the 2D Shelfy-Stream, 3D Blatter/Pattyn, and 3D Full-Stokes formulations capture the bulk of the ice-flow physics. These elements can be coupled together, based on the Arlequin method, so that on a large scale model such as Antarctica, each type of finite element is used in the most efficient manner. For each finite element referenced above, ISSM implements an adjoint. This adjoint can be used to carry out model inversions of unknown model parameters, typically ice rheology and basal drag at the ice/bedrock interface, using a metric such as the observed InSAR surface velocity. This data assimilation capability is crucial to allow spinning up of ice flow models using available satellite data. ISSM relies on the PETSc library for its vectors, matrices, and solvers. This allows ISSM to run efficiently on any parallel platform, whether shared or distrib- ISSM: Ice Sheet System Model NASA's Jet Propulsion Laboratory, Pasadena, California uted. It can run on the largest clusters, and is fully scalable. This allows ISSM to tackle models the size of continents. ISSM is embedded into MATLAB and Python, both open scientific platforms. This improves its outreach within the science community. It is entirely written in C/C++, which gives it flexibility in its design, and the power/speed that C/C++ allows. ISSM is svn (subversion) hosted, on a JPL repository, to facilitate its development and maintenance. ISSM can also model propagation of rifts using contact mechanics and mesh splitting, and can interface to the Dakota software. To carry out sensitivity analysis, mesh partitioning algorithms are available, based on the Scotch, Chaco, and Metis partitioners that ensure equal area mesh partitions can be done, which are then usable for sampling and local reliability methods.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying
2013-12-01
Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Parallel computation of transverse wakes in linear colliders
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhan, Xiaowei; Ko, Kwok
1996-11-01
SLAC has proposed the detuned structure (DS) as one possible design to control the emittance growth of long bunch trains due to transverse wakefields in the Next Linear Collider (NLC). The DS consists of 206 cells with tapering from cell to cell of the order of few microns to provide Gaussian detuning of the dipole modes. The decoherence of these modes leads to two orders of magnitude reduction in wakefield experienced by the trailing bunch. To model such a large heterogeneous structure realistically is impractical with finite-difference codes using structured grids. The authors have calculated the wakefield in the DSmore » on a parallel computer with a finite-element code using an unstructured grid. The parallel implementation issues are presented along with simulation results that include contributions from higher dipole bands and wall dissipation.« less
Aeroelasticity of wing and wing-body configurations on parallel computers
NASA Technical Reports Server (NTRS)
Byun, Chansup
1995-01-01
The objective of this research is to develop computationally efficient methods for solving aeroelasticity problems on parallel computers. Both uncoupled and coupled methods are studied in this research. For the uncoupled approach, the conventional U-g method is used to determine the flutter boundary. The generalized aerodynamic forces required are obtained by the pulse transfer-function analysis method. For the coupled approach, the fluid-structure interaction is obtained by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
Time-dependent density functional theory with twist-averaged boundary conditions
NASA Astrophysics Data System (ADS)
Schuetrumpf, B.; Nazarewicz, W.; Reinhard, P.-G.
2016-05-01
Background: Time-dependent density functional theory is widely used to describe excitations of many-fermion systems. In its many applications, three-dimensional (3D) coordinate-space representation is used, and infinite-domain calculations are limited to a finite volume represented by a spatial box. For finite quantum systems (atoms, molecules, nuclei, hadrons), the commonly used periodic or reflecting boundary conditions introduce spurious quantization of the continuum states and artificial reflections from boundary; hence, an incorrect treatment of evaporated particles. Purpose: The finite-volume artifacts for finite systems can be practically cured by invoking an absorbing potential in a certain boundary region sufficiently far from the described system. However, such absorption cannot be applied in the calculations of infinite matter (crystal electrons, quantum fluids, neutron star crust), which suffer from unphysical effects stemming from a finite computational box used. Here, twist-averaged boundary conditions (TABC) have been used successfully to diminish the finite-volume effects. In this work, we extend TABC to time-dependent modes. Method: We use the 3D time-dependent density functional framework with the Skyrme energy density functional. The practical calculations are carried out for small- and large-amplitude electric dipole and quadrupole oscillations of 16O. We apply and compare three kinds of boundary conditions: periodic, absorbing, and twist-averaged. Results: Calculations employing absorbing boundary conditions (ABC) and TABC are superior to those based on periodic boundary conditions. For low-energy excitations, TABC and ABC variants yield very similar results. With only four twist phases per spatial direction in TABC, one obtains an excellent reduction of spurious fluctuations. In the nonlinear regime, one has to deal with evaporated particles. In TABC, the floating nucleon gas remains in the box; the amount of nucleons in the gas is found to be roughly the same as the number of absorbed particles in ABC. Conclusion: We demonstrate that by using TABC, one can reduce finite-volume effects drastically without adding any additional parameters associated with absorption at large distances. Moreover, TABC are an obvious choice for time-dependent calculations for infinite systems. Since TABC calculations for different twists can be performed independently, the method is trivially adapted to parallel computing.
NASA Astrophysics Data System (ADS)
Grayver, Alexander V.
2015-07-01
This paper presents a distributed magnetotelluric inversion scheme based on adaptive finite-element method (FEM). The key novel aspect of the introduced algorithm is the use of automatic mesh refinement techniques for both forward and inverse modelling. These techniques alleviate tedious and subjective procedure of choosing a suitable model parametrization. To avoid overparametrization, meshes for forward and inverse problems were decoupled. For calculation of accurate electromagnetic (EM) responses, automatic mesh refinement algorithm based on a goal-oriented error estimator has been adopted. For further efficiency gain, EM fields for each frequency were calculated using independent meshes in order to account for substantially different spatial behaviour of the fields over a wide range of frequencies. An automatic approach for efficient initial mesh design in inverse problems based on linearized model resolution matrix was developed. To make this algorithm suitable for large-scale problems, it was proposed to use a low-rank approximation of the linearized model resolution matrix. In order to fill a gap between initial and true model complexities and resolve emerging 3-D structures better, an algorithm for adaptive inverse mesh refinement was derived. Within this algorithm, spatial variations of the imaged parameter are calculated and mesh is refined in the neighborhoods of points with the largest variations. A series of numerical tests were performed to demonstrate the utility of the presented algorithms. Adaptive mesh refinement based on the model resolution estimates provides an efficient tool to derive initial meshes which account for arbitrary survey layouts, data types, frequency content and measurement uncertainties. Furthermore, the algorithm is capable to deliver meshes suitable to resolve features on multiple scales while keeping number of unknowns low. However, such meshes exhibit dependency on an initial model guess. Additionally, it is demonstrated that the adaptive mesh refinement can be particularly efficient in resolving complex shapes. The implemented inversion scheme was able to resolve a hemisphere object with sufficient resolution starting from a coarse discretization and refining mesh adaptively in a fully automatic process. The code is able to harness the computational power of modern distributed platforms and is shown to work with models consisting of millions of degrees of freedom. Significant computational savings were achieved by using locally refined decoupled meshes.
High Resolution DNS of Turbulent Flows using an Adaptive, Finite Volume Method
NASA Astrophysics Data System (ADS)
Trebotich, David
2014-11-01
We present a new computational capability for high resolution simulation of incompressible viscous flows. Our approach is based on cut cell methods where an irregular geometry such as a bluff body is intersected with a rectangular Cartesian grid resulting in cut cells near the boundary. In the cut cells we use a conservative discretization based on a discrete form of the divergence theorem to approximate fluxes for elliptic and hyperbolic terms in the Navier-Stokes equations. Away from the boundary the method reduces to a finite difference method. The algorithm is implemented in the Chombo software framework which supports adaptive mesh refinement and massively parallel computations. The code is scalable to 200,000 + processor cores on DOE supercomputers, resulting in DNS studies at unprecedented scale and resolution. For flow past a cylinder in transition (Re = 300) we observe a number of secondary structures in the far wake in 2D where the wake is over 120 cylinder diameters in length. These are compared with the more regularized wake structures in 3D at the same scale. For flow past a sphere (Re = 600) we resolve an arrowhead structure in the velocity in the near wake. The effectiveness of AMR is further highlighted in a simulation of turbulent flow (Re = 6000) in the contraction of an oil well blowout preventer. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Contract Number DE-AC02-05-CH11231.
Stochastic Inversion of 2D Magnetotelluric Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2010-07-01
The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vay, Jean-Luc, E-mail: jlvay@lbl.gov; Haber, Irving; Godfrey, Brendan B.
Pseudo-spectral electromagnetic solvers (i.e. representing the fields in Fourier space) have extraordinary precision. In particular, Haber et al. presented in 1973 a pseudo-spectral solver that integrates analytically the solution over a finite time step, under the usual assumption that the source is constant over that time step. Yet, pseudo-spectral solvers have not been widely used, due in part to the difficulty for efficient parallelization owing to global communications associated with global FFTs on the entire computational domains. A method for the parallelization of electromagnetic pseudo-spectral solvers is proposed and tested on single electromagnetic pulses, and on Particle-In-Cell simulations of themore » wakefield formation in a laser plasma accelerator. The method takes advantage of the properties of the Discrete Fourier Transform, the linearity of Maxwell’s equations and the finite speed of light for limiting the communications of data within guard regions between neighboring computational domains. Although this requires a small approximation, test results show that no significant error is made on the test cases that have been presented. The proposed method opens the way to solvers combining the favorable parallel scaling of standard finite-difference methods with the accuracy advantages of pseudo-spectral methods.« less
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
NASA Astrophysics Data System (ADS)
Ghosh, Diptesh; Chakrabarti, Anindya S.
2017-10-01
In this paper, we study a large-scale distributed coordination problem and propose efficient adaptive strategies to solve the problem. The basic problem is to allocate finite number of resources to individual agents in the absence of a central planner such that there is as little congestion as possible and the fraction of unutilized resources is reduced as far as possible. In the absence of a central planner and global information, agents can employ adaptive strategies that uses only a finite knowledge about the competitors. In this paper, we show that a combination of finite information sets and reinforcement learning can increase the utilization fraction of resources substantially.
Exploiting Symmetry on Parallel Architectures.
NASA Astrophysics Data System (ADS)
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Quadrilateral/hexahedral finite element mesh coarsening
Staten, Matthew L; Dewey, Mark W; Scott, Michael A; Benzley, Steven E
2012-10-16
A technique for coarsening a finite element mesh ("FEM") is described. This technique includes identifying a coarsening region within the FEM to be coarsened. Perimeter chords running along perimeter boundaries of the coarsening region are identified. The perimeter chords are redirected to create an adaptive chord separating the coarsening region from a remainder of the FEM. The adaptive chord runs through mesh elements residing along the perimeter boundaries of the coarsening region. The adaptive chord is then extracted to coarsen the FEM.
The effect of selection environment on the probability of parallel evolution.
Bailey, Susan F; Rodrigue, Nicolas; Kassen, Rees
2015-06-01
Across the great diversity of life, there are many compelling examples of parallel and convergent evolution-similar evolutionary changes arising in independently evolving populations. Parallel evolution is often taken to be strong evidence of adaptation occurring in populations that are highly constrained in their genetic variation. Theoretical models suggest a few potential factors driving the probability of parallel evolution, but experimental tests are needed. In this study, we quantify the degree of parallel evolution in 15 replicate populations of Pseudomonas fluorescens evolved in five different environments that varied in resource type and arrangement. We identified repeat changes across multiple levels of biological organization from phenotype, to gene, to nucleotide, and tested the impact of 1) selection environment, 2) the degree of adaptation, and 3) the degree of heterogeneity in the environment on the degree of parallel evolution at the gene-level. We saw, as expected, that parallel evolution occurred more often between populations evolved in the same environment; however, the extent of parallel evolution varied widely. The degree of adaptation did not significantly explain variation in the extent of parallelism in our system but number of available beneficial mutations correlated negatively with parallel evolution. In addition, degree of parallel evolution was significantly higher in populations evolved in a spatially structured, multiresource environment, suggesting that environmental heterogeneity may be an important factor constraining adaptation. Overall, our results stress the importance of environment in driving parallel evolutionary changes and point to a number of avenues for future work for understanding when evolution is predictable. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Efficient Preconditioning for the p-Version Finite Element Method in Two Dimensions
1989-10-01
paper, we study fast parallel preconditioners for systems of equations arising from the p-version finite element method. The p-version finite element...computations and the solution of a relatively small global auxiliary problem. We study two different methods. In the first (Section 3), the global...20], will be studied in the next section. Problem (3.12) is obviously much more easily solved than the original problem ,nd the procedure is highly
Programming Probabilistic Structural Analysis for Parallel Processing Computer
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.
1991-01-01
The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Optimal Design of Passive Power Filters Based on Pseudo-parallel Genetic Algorithm
NASA Astrophysics Data System (ADS)
Li, Pei; Li, Hongbo; Gao, Nannan; Niu, Lin; Guo, Liangfeng; Pei, Ying; Zhang, Yanyan; Xu, Minmin; Chen, Kerui
2017-05-01
The economic costs together with filter efficiency are taken as targets to optimize the parameter of passive filter. Furthermore, the method of combining pseudo-parallel genetic algorithm with adaptive genetic algorithm is adopted in this paper. In the early stages pseudo-parallel genetic algorithm is introduced to increase the population diversity, and adaptive genetic algorithm is used in the late stages to reduce the workload. At the same time, the migration rate of pseudo-parallel genetic algorithm is improved to change with population diversity adaptively. Simulation results show that the filter designed by the proposed method has better filtering effect with lower economic cost, and can be used in engineering.
An Artificial Neural Networks Method for Solving Partial Differential Equations
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2010-09-01
While there already exists many analytical and numerical techniques for solving PDEs, this paper introduces an approach using artificial neural networks. The approach consists of a technique developed by combining the standard numerical method, finite-difference, with the Hopfield neural network. The method is denoted Hopfield-finite-difference (HFD). The architecture of the nets, energy function, updating equations, and algorithms are developed for the method. The HFD method has been used successfully to approximate the solution of classical PDEs, such as the Wave, Heat, Poisson and the Diffusion equations, and on a system of PDEs. The software Matlab is used to obtain the results in both tabular and graphical form. The results are similar in terms of accuracy to those obtained by standard numerical methods. In terms of speed, the parallel nature of the Hopfield nets methods makes them easier to implement on fast parallel computers while some numerical methods need extra effort for parallelization.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
Graphics applications utilizing parallel processing
NASA Technical Reports Server (NTRS)
Rice, John R.
1990-01-01
The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
Ciaccio, Edward J; Micheli-Tzanakou, Evangelia
2007-07-01
Common-mode noise degrades cardiovascular signal quality and diminishes measurement accuracy. Filtering to remove noise components in the frequency domain often distorts the signal. Two adaptive noise canceling (ANC) algorithms were tested to adjust weighted reference signals for optimal subtraction from a primary signal. Update of weight w was based upon the gradient term of the steepest descent equation: [see text], where the error epsilon is the difference between primary and weighted reference signals. nabla was estimated from Deltaepsilon(2) and Deltaw without using a variable Deltaw in the denominator which can cause instability. The Parallel Comparison (PC) algorithm computed Deltaepsilon(2) using fixed finite differences +/- Deltaw in parallel during each discrete time k. The ALOPEX algorithm computed Deltaepsilon(2)x Deltaw from time k to k + 1 to estimate nabla, with a random number added to account for Deltaepsilon(2) . Deltaw--> 0 near the optimal weighting. Using simulated data, both algorithms stably converged to the optimal weighting within 50-2000 discrete sample points k even with a SNR = 1:8 and weights which were initialized far from the optimal. Using a sharply pulsatile cardiac electrogram signal with added noise so that the SNR = 1:5, both algorithms exhibited stable convergence within 100 ms (100 sample points). Fourier spectral analysis revealed minimal distortion when comparing the signal without added noise to the ANC restored signal. ANC algorithms based upon difference calculations can rapidly and stably converge to the optimal weighting in simulated and real cardiovascular data. Signal quality is restored with minimal distortion, increasing the accuracy of biophysical measurement.
Evaluating the performance of the particle finite element method in parallel architectures
NASA Astrophysics Data System (ADS)
Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.
2014-05-01
This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Adapting high-level language programs for parallel processing using data flow
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1988-01-01
EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
A Discontinuous Galerkin Finite Element Method for Hamilton-Jacobi Equations
NASA Technical Reports Server (NTRS)
Hu, Changqing; Shu, Chi-Wang
1998-01-01
In this paper, we present a discontinuous Galerkin finite element method for solving the nonlinear Hamilton-Jacobi equations. This method is based on the Runge-Kutta discontinuous Galerkin finite element method for solving conservation laws. The method has the flexibility of treating complicated geometry by using arbitrary triangulation, can achieve high order accuracy with a local, compact stencil, and are suited for efficient parallel implementation. One and two dimensional numerical examples are given to illustrate the capability of the method.
Parallel Adaptive Mesh Refinement Library
NASA Technical Reports Server (NTRS)
Mac-Neice, Peter; Olson, Kevin
2005-01-01
Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug; Johnson, Keith
1989-01-01
The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Xujun; Li, Jiyuan; Jiang, Xikai
An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Norman, Matthew R
2014-01-01
The novel ADER-DT time discretization is applied to two-dimensional transport in a quadrature-free, WENO- and FCT-limited, Finite-Volume context. Emphasis is placed on (1) the serial and parallel computational properties of ADER-DT and this framework and (2) the flexibility of ADER-DT and this framework in efficiently balancing accuracy with other constraints important to transport applications. This study demonstrates a range of choices for the user when approaching their specific application while maintaining good parallel properties. In this method, genuine multi-dimensionality, single-step and single-stage time stepping, strict positivity, and a flexible range of limiting are all achieved with only one parallel synchronizationmore » and data exchange per time step. In terms of parallel data transfers per simulated time interval, this improves upon multi-stage time stepping and post-hoc filtering techniques such as hyperdiffusion. This method is evaluated with standard transport test cases over a range of limiting options to demonstrate quantitatively and qualitatively what a user should expect when employing this method in their application.« less
Parallel Simulation of Three-Dimensional Free-Surface Fluid Flow Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAER,THOMAS A.; SUBIA,SAMUEL R.; SACKINGER,PHILIP A.
2000-01-18
We describe parallel simulations of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact lines. The Galerlin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of problem unknowns. Issues concerning the proper constraints along the solid-fluid dynamic contact line inmore » three dimensions are discussed. Parallel computations are carried out for an example taken from the coating flow industry, flow in the vicinity of a slot coater edge. This is a three-dimensional free-surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another part of the flow domain. Discussion focuses on parallel speedups for fixed problem size, a class of problems of immediate practical importance.« less
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.
2005-01-01
A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.
Zhao, Xujun; Li, Jiyuan; Jiang, Xikai; ...
2017-06-29
An efficient parallel Stokes’s solver is developed towards the complete inclusion of hydrodynamic interactions of Brownian particles in any geometry. A Langevin description of the particle dynamics is adopted, where the long-range interactions are included using a Green’s function formalism. We present a scalable parallel computational approach, where the general geometry Stokeslet is calculated following a matrix-free algorithm using the General geometry Ewald-like method. Our approach employs a highly-efficient iterative finite element Stokes’ solver for the accurate treatment of long-range hydrodynamic interactions within arbitrary confined geometries. A combination of mid-point time integration of the Brownian stochastic differential equation, the parallelmore » Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuation-dissipation theorem result in an O(N) parallel algorithm. We also illustrate the new algorithm in the context of the dynamics of confined polymer solutions in equilibrium and non-equilibrium conditions. Our method is extended to treat suspended finite size particles of arbitrary shape in any geometry using an Immersed Boundary approach.« less
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Efficient Geometric Sound Propagation Using Visibility Culling
NASA Astrophysics Data System (ADS)
Chandak, Anish
2011-07-01
Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel
Tian, Lili; Bao, Hong; Wang, Meng; Duan, Xuechao
2016-01-01
With the aim of developing multiple input and multiple output (MIMO) coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR) controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified. PMID:27706076
Albedo of an irradiated plane-parallel atmosphere with finite optical depth
NASA Astrophysics Data System (ADS)
Fukue, Jun
2018-03-01
We analytically derive albedo for a plane-parallel atmosphere with finite optical depth, irradiated by an external source, under the local thermodynamic equilibrium approximation. Albedo is expressed as a function of the photon destruction probability ɛ and optical depth τ, with several parameters such as dilution factors of the external source. In the particular case of the infinite optical depth, albedo A is expressed as A=[1 + (1-W_J/W_H)√{3ɛ}/3]/(1+√{3ɛ}), where WJ and WH are the dilution factors for the mean intensity and Eddington flux, respectively. An example of a model atmosphere is also presented under a gray approximation.
Parallel solution of high-order numerical schemes for solving incompressible flows
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Lin, Avi; Liou, May-Fun; Blech, Richard A.
1993-01-01
A new parallel numerical scheme for solving incompressible steady-state flows is presented. The algorithm uses a finite-difference approach to solving the Navier-Stokes equations. The algorithms are scalable and expandable. They may be used with only two processors or with as many processors as are available. The code is general and expandable. Any size grid may be used. Four processors of the NASA LeRC Hypercluster were used to solve for steady-state flow in a driven square cavity. The Hypercluster was configured in a distributed-memory, hypercube-like architecture. By using a 50-by-50 finite-difference solution grid, an efficiency of 74 percent (a speedup of 2.96) was obtained.
NASA Technical Reports Server (NTRS)
Barnard, Stephen T.; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
The design of a parallel implementation of multilevel recursive spectral bisection is described. The goal is to implement a code that is fast enough to enable dynamic repartitioning of adaptive meshes.
Hellander, Andreas; Lawson, Michael J; Drawert, Brian; Petzold, Linda
2015-01-01
The efficiency of exact simulation methods for the reaction-diffusion master equation (RDME) is severely limited by the large number of diffusion events if the mesh is fine or if diffusion constants are large. Furthermore, inherent properties of exact kinetic-Monte Carlo simulation methods limit the efficiency of parallel implementations. Several approximate and hybrid methods have appeared that enable more efficient simulation of the RDME. A common feature to most of them is that they rely on splitting the system into its reaction and diffusion parts and updating them sequentially over a discrete timestep. This use of operator splitting enables more efficient simulation but it comes at the price of a temporal discretization error that depends on the size of the timestep. So far, existing methods have not attempted to estimate or control this error in a systematic manner. This makes the solvers hard to use for practitioners since they must guess an appropriate timestep. It also makes the solvers potentially less efficient than if the timesteps are adapted to control the error. Here, we derive estimates of the local error and propose a strategy to adaptively select the timestep when the RDME is simulated via a first order operator splitting. While the strategy is general and applicable to a wide range of approximate and hybrid methods, we exemplify it here by extending a previously published approximate method, the Diffusive Finite-State Projection (DFSP) method, to incorporate temporal adaptivity. PMID:26865735
THE PLUTO CODE FOR ADAPTIVE MESH COMPUTATIONS IN ASTROPHYSICAL FLUID DYNAMICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mignone, A.; Tzeferacos, P.; Zanni, C.
We present a description of the adaptive mesh refinement (AMR) implementation of the PLUTO code for solving the equations of classical and special relativistic magnetohydrodynamics (MHD and RMHD). The current release exploits, in addition to the static grid version of the code, the distributed infrastructure of the CHOMBO library for multidimensional parallel computations over block-structured, adaptively refined grids. We employ a conservative finite-volume approach where primary flow quantities are discretized at the cell center in a dimensionally unsplit fashion using the Corner Transport Upwind method. Time stepping relies on a characteristic tracing step where piecewise parabolic method, weighted essentially non-oscillatory,more » or slope-limited linear interpolation schemes can be handily adopted. A characteristic decomposition-free version of the scheme is also illustrated. The solenoidal condition of the magnetic field is enforced by augmenting the equations with a generalized Lagrange multiplier providing propagation and damping of divergence errors through a mixed hyperbolic/parabolic explicit cleaning step. Among the novel features, we describe an extension of the scheme to include non-ideal dissipative processes, such as viscosity, resistivity, and anisotropic thermal conduction without operator splitting. Finally, we illustrate an efficient treatment of point-local, potentially stiff source terms over hierarchical nested grids by taking advantage of the adaptivity in time. Several multidimensional benchmarks and applications to problems of astrophysical relevance assess the potentiality of the AMR version of PLUTO in resolving flow features separated by large spatial and temporal disparities.« less
Hellander, Andreas; Lawson, Michael J; Drawert, Brian; Petzold, Linda
2014-06-01
The efficiency of exact simulation methods for the reaction-diffusion master equation (RDME) is severely limited by the large number of diffusion events if the mesh is fine or if diffusion constants are large. Furthermore, inherent properties of exact kinetic-Monte Carlo simulation methods limit the efficiency of parallel implementations. Several approximate and hybrid methods have appeared that enable more efficient simulation of the RDME. A common feature to most of them is that they rely on splitting the system into its reaction and diffusion parts and updating them sequentially over a discrete timestep. This use of operator splitting enables more efficient simulation but it comes at the price of a temporal discretization error that depends on the size of the timestep. So far, existing methods have not attempted to estimate or control this error in a systematic manner. This makes the solvers hard to use for practitioners since they must guess an appropriate timestep. It also makes the solvers potentially less efficient than if the timesteps are adapted to control the error. Here, we derive estimates of the local error and propose a strategy to adaptively select the timestep when the RDME is simulated via a first order operator splitting. While the strategy is general and applicable to a wide range of approximate and hybrid methods, we exemplify it here by extending a previously published approximate method, the Diffusive Finite-State Projection (DFSP) method, to incorporate temporal adaptivity.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Particle-in-cell simulations of the critical ionization velocity effect in finite size clouds
NASA Technical Reports Server (NTRS)
Moghaddam-Taaheri, E.; Lu, G.; Goertz, C. K.; Nishikawa, K. - I.
1994-01-01
The critical ionization velocity (CIV) mechanism in a finite size cloud is studied with a series of electrostatic particle-in-cell simulations. It is observed that an initial seed ionization, produced by non-CIV mechanisms, generates a cross-field ion beam which excites a modified beam-plasma instability (MBPI) with frequency in the range of the lower hybrid frequency. The excited waves accelerate electrons along the magnetic field up to the ion drift energy that exceeds the ionization energy of the neutral atoms. The heated electrons in turn enhance the ion beam by electron-neutral impact ionization, which establishes a positive feedback loop in maintaining the CIV process. It is also found that the efficiency of the CIV mechanism depends on the finite size of the gas cloud in the following ways: (1) Along the ambient magnetic field the finite size of the cloud, L (sub parallel), restricts the growth of the fastest growing mode, with a wavelength lambda (sub m parallel), of the MBPI. The parallel electron heating at wave saturation scales approximately as (L (sub parallel)/lambda (sub m parallel)) (exp 1/2); (2) Momentum coupling between the cloud and the ambient plasma via the Alfven waves occurs as a result of the finite size of the cloud in the direction perpendicular to both the ambient magnetic field and the neutral drift. This reduces exponentially with time the relative drift between the ambient plasma and the neutrals. The timescale is inversely proportional to the Alfven velocity. (3) The transvers e charge separation field across the cloud was found to result in the modulation of the beam velocity which reduces the parallel heating of electrons and increases the transverse acceleration of electrons. (4) Some energetic electrons are lost from the cloud along the magnetic field at a rate characterized by the acoustic velocity, instead of the electron thermal velocity. The loss of energetic electrons from the cloud seems to be larger in the direction of plasma drift relative to the neutrals, where the loss rate is characterized by the neutral drift velocity. It is also shown that a factor of 4 increase in the ambient plasma density, increases the CIV ionization yield by almost 2 orders of magnitude at the end of a typical run. It is concluded that a larger ambient plasma density can result in a larger CIV yield because of (1) larger seed ion production by non-CIV mechanisms, (2) smaller Alfven velocity and hence weak momentum coupling, and (3) smaller ratio of the ion beam density to the ambient ion density, and therefore a weaker modulation of the beam velocity. The simulation results are used to interpret various chemical release experiments in space.
Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C; Dehio, Christoph
2011-02-10
Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment.
Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C.; Dehio, Christoph
2011-01-01
Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment. PMID:21347280
Resistance of a plate in parallel flow at low Reynolds numbers
NASA Technical Reports Server (NTRS)
Janour, Zbynek
1951-01-01
The present paper gives the results of measurements of the resistance of a plate placed parallel to the flow in the range of Reynolds numbers from 10 to 2300; in this range the resistance deviates from the formula of Blasius. The lower limit of validity of the Blasius formula is determined and also the increase in resistance at the edges parallel to the flow in the case of a plate of finite width.
Haptic adaptation to slant: No transfer between exploration modes
van Dam, Loes C. J.; Plaisier, Myrthe A.; Glowania, Catharina; Ernst, Marc O.
2016-01-01
Human touch is an inherently active sense: to estimate an object’s shape humans often move their hand across its surface. This way the object is sampled both in a serial (sampling different parts of the object across time) and parallel fashion (sampling using different parts of the hand simultaneously). Both the serial (moving a single finger) and parallel (static contact with the entire hand) exploration modes provide reliable and similar global shape information, suggesting the possibility that this information is shared early in the sensory cortex. In contrast, we here show the opposite. Using an adaptation-and-transfer paradigm, a change in haptic perception was induced by slant-adaptation using either the serial or parallel exploration mode. A unified shape-based coding would predict that this would equally affect perception using other exploration modes. However, we found that adaptation-induced perceptual changes did not transfer between exploration modes. Instead, serial and parallel exploration components adapted simultaneously, but to different kinaesthetic aspects of exploration behaviour rather than object-shape per se. These results indicate that a potential combination of information from different exploration modes can only occur at down-stream cortical processing stages, at which adaptation is no longer effective. PMID:27698392
Real-time adaptive finite element solution of time-dependent Kohn-Sham equation
NASA Astrophysics Data System (ADS)
Bao, Gang; Hu, Guanghui; Liu, Di
2015-01-01
In our previous paper (Bao et al., 2012 [1]), a general framework of using adaptive finite element methods to solve the Kohn-Sham equation has been presented. This work is concerned with solving the time-dependent Kohn-Sham equations. The numerical methods are studied in the time domain, which can be employed to explain both the linear and the nonlinear effects. A Crank-Nicolson scheme and linear finite element space are employed for the temporal and spatial discretizations, respectively. To resolve the trouble regions in the time-dependent simulations, a heuristic error indicator is introduced for the mesh adaptive methods. An algebraic multigrid solver is developed to efficiently solve the complex-valued system derived from the semi-implicit scheme. A mask function is employed to remove or reduce the boundary reflection of the wavefunction. The effectiveness of our method is verified by numerical simulations for both linear and nonlinear phenomena, in which the effectiveness of the mesh adaptive methods is clearly demonstrated.
CosmosDG: An hp -adaptive Discontinuous Galerkin Code for Hyper-resolved Relativistic MHD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anninos, Peter; Lau, Cheuk; Bryant, Colton
We have extended Cosmos++, a multidimensional unstructured adaptive mesh code for solving the covariant Newtonian and general relativistic radiation magnetohydrodynamic (MHD) equations, to accommodate both discrete finite volume and arbitrarily high-order finite element structures. The new finite element implementation, called CosmosDG, is based on a discontinuous Galerkin (DG) formulation, using both entropy-based artificial viscosity and slope limiting procedures for the regularization of shocks. High-order multistage forward Euler and strong-stability preserving Runge–Kutta time integration options complement high-order spatial discretization. We have also added flexibility in the code infrastructure allowing for both adaptive mesh and adaptive basis order refinement to be performedmore » separately or simultaneously in a local (cell-by-cell) manner. We discuss in this report the DG formulation and present tests demonstrating the robustness, accuracy, and convergence of our numerical methods applied to special and general relativistic MHD, although we note that an equivalent capability currently also exists in CosmosDG for Newtonian systems.« less
CosmosDG: An hp-adaptive Discontinuous Galerkin Code for Hyper-resolved Relativistic MHD
NASA Astrophysics Data System (ADS)
Anninos, Peter; Bryant, Colton; Fragile, P. Chris; Holgado, A. Miguel; Lau, Cheuk; Nemergut, Daniel
2017-08-01
We have extended Cosmos++, a multidimensional unstructured adaptive mesh code for solving the covariant Newtonian and general relativistic radiation magnetohydrodynamic (MHD) equations, to accommodate both discrete finite volume and arbitrarily high-order finite element structures. The new finite element implementation, called CosmosDG, is based on a discontinuous Galerkin (DG) formulation, using both entropy-based artificial viscosity and slope limiting procedures for the regularization of shocks. High-order multistage forward Euler and strong-stability preserving Runge-Kutta time integration options complement high-order spatial discretization. We have also added flexibility in the code infrastructure allowing for both adaptive mesh and adaptive basis order refinement to be performed separately or simultaneously in a local (cell-by-cell) manner. We discuss in this report the DG formulation and present tests demonstrating the robustness, accuracy, and convergence of our numerical methods applied to special and general relativistic MHD, although we note that an equivalent capability currently also exists in CosmosDG for Newtonian systems.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
Thorpe, Roger S; Barlow, Axel; Malhotra, Anita; Surget-Groba, Yann
2015-03-01
Global warming will impact species in a number of ways, and it is important to know the extent to which natural populations can adapt to anthropogenic climate change by natural selection. Parallel microevolution within separate species can demonstrate natural selection, but several studies of homoplasy have not yet revealed examples of widespread parallel evolution in a generic radiation. Taking into account primary phylogeographic divisions, we investigate numerous quantitative traits (size, shape, scalation, colour pattern and hue) in anole radiations from the mountainous Lesser Antillean islands. Adaptation to climatic differences can lead to very pronounced differences between spatially close populations with all studied traits showing some evidence of parallel evolution. Traits from shape, scalation, pattern and hue (particularly the latter) show widespread evolutionary parallels within these species in response to altitudinal climate variation greater than extreme anthropogenic climate change predicted for 2080. This gives strong evidence of the ability to adapt to climate variation by natural selection throughout this radiation. As anoles can evolve very rapidly, it suggests anthropogenic climate change is likely to be less of a conservation threat than other factors, such as habitat loss and invasive species, in this, Lesser Antillean, biodiversity hot spot. © 2015 John Wiley & Sons Ltd.
Regularization with numerical extrapolation for finite and UV-divergent multi-loop integrals
NASA Astrophysics Data System (ADS)
de Doncker, E.; Yuasa, F.; Kato, K.; Ishikawa, T.; Kapenga, J.; Olagbemi, O.
2018-03-01
We give numerical integration results for Feynman loop diagrams such as those covered by Laporta (2000) and by Baikov and Chetyrkin (2010), and which may give rise to loop integrals with UV singularities. We explore automatic adaptive integration using multivariate techniques from the PARINT package for multivariate integration, as well as iterated integration with programs from the QUADPACK package, and a trapezoidal method based on a double exponential transformation. PARINT is layered over MPI (Message Passing Interface), and incorporates advanced parallel/distributed techniques including load balancing among processes that may be distributed over a cluster or a network/grid of nodes. Results are included for 2-loop vertex and box diagrams and for sets of 2-, 3- and 4-loop self-energy diagrams with or without UV terms. Numerical regularization of integrals with singular terms is achieved by linear and non-linear extrapolation methods.
Higher-order adaptive finite-element methods for Kohn–Sham density functional theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Motamarri, P.; Nowak, M.R.; Leiter, K.
2013-11-15
We present an efficient computational approach to perform real-space electronic structure calculations using an adaptive higher-order finite-element discretization of Kohn–Sham density-functional theory (DFT). To this end, we develop an a priori mesh-adaption technique to construct a close to optimal finite-element discretization of the problem. We further propose an efficient solution strategy for solving the discrete eigenvalue problem by using spectral finite-elements in conjunction with Gauss–Lobatto quadrature, and a Chebyshev acceleration technique for computing the occupied eigenspace. The proposed approach has been observed to provide a staggering 100–200-fold computational advantage over the solution of a generalized eigenvalue problem. Using the proposedmore » solution procedure, we investigate the computational efficiency afforded by higher-order finite-element discretizations of the Kohn–Sham DFT problem. Our studies suggest that staggering computational savings—of the order of 1000-fold—relative to linear finite-elements can be realized, for both all-electron and local pseudopotential calculations, by using higher-order finite-element discretizations. On all the benchmark systems studied, we observe diminishing returns in computational savings beyond the sixth-order for accuracies commensurate with chemical accuracy, suggesting that the hexic spectral-element may be an optimal choice for the finite-element discretization of the Kohn–Sham DFT problem. A comparative study of the computational efficiency of the proposed higher-order finite-element discretizations suggests that the performance of finite-element basis is competing with the plane-wave discretization for non-periodic local pseudopotential calculations, and compares to the Gaussian basis for all-electron calculations to within an order of magnitude. Further, we demonstrate the capability of the proposed approach to compute the electronic structure of a metallic system containing 1688 atoms using modest computational resources, and good scalability of the present implementation up to 192 processors.« less
An M-step preconditioned conjugate gradient method for parallel computation
NASA Technical Reports Server (NTRS)
Adams, L.
1983-01-01
This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
NASA Astrophysics Data System (ADS)
Kim, Jae Wook
2013-05-01
This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Generalizing the TRAPRG and TRAPAX finite elements
NASA Technical Reports Server (NTRS)
Hurwitz, M. M.
1983-01-01
The NASTRAN TRAPRG and TRAPAX finite elements are very restrictive as to shape and grid point numbering. The elements must be trapezoidal with two sides parallel to the radial axis. In addition, the ordering of the grid points on the element connection card must follow strict rules. The paper describes the generalization of these elements so that these restrictions no longer apply.
Interactions of waves on electron streams or plasmas are studied for several geometric configurations of finite cross section in a finite magnetic...velocity parallel to the magnetic field. It is further assumed that either macroscopic neutrality exists or static spacecharge forces are negligible. For...the most part the quasi-static analysis is used. For the case of two drifting streams cyclotron waves act to giveinstabilities which are either
Scalable Implementation of Finite Elements by NASA _ Implicit (ScIFEi)
NASA Technical Reports Server (NTRS)
Warner, James E.; Bomarito, Geoffrey F.; Heber, Gerd; Hochhalter, Jacob D.
2016-01-01
Scalable Implementation of Finite Elements by NASA (ScIFEN) is a parallel finite element analysis code written in C++. ScIFEN is designed to provide scalable solutions to computational mechanics problems. It supports a variety of finite element types, nonlinear material models, and boundary conditions. This report provides an overview of ScIFEi (\\Sci-Fi"), the implicit solid mechanics driver within ScIFEN. A description of ScIFEi's capabilities is provided, including an overview of the tools and features that accompany the software as well as a description of the input and output le formats. Results from several problems are included, demonstrating the efficiency and scalability of ScIFEi by comparing to finite element analysis using a commercial code.
Seismic imaging using finite-differences and parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.
1997-12-31
A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less
NASA Astrophysics Data System (ADS)
Vasko, I.; Agapitov, O. V.; Mozer, F.; Bonnell, J. W.; Krasnoselskikh, V.; Artemyev, A.; Drake, J. F.
2017-12-01
Chorus waves observed in the Earth inner magnetosphere sometimes exhibit significantly distorted (nonharmonic) parallel electric field waveform. In spectrograms these waveform features show up as overtones of chorus wave. In this work we show that the chorus wave parallel electric field is distorted due to finite temperature of electrons. The distortion of the parallel electric field is described analytically and reproduced in the numerical fluid simulations. Due to this effect the chorus energy is transferred to higher frequencies making possible efficient scattering of low ( a few keV) energy electrons.
NASA Technical Reports Server (NTRS)
Tezduyar, Tayfun E.
1998-01-01
This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Gao, Guang R.
1999-01-01
Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.
What is adaptive about adaptive decision making? A parallel constraint satisfaction account.
Glöckner, Andreas; Hilbig, Benjamin E; Jekel, Marc
2014-12-01
There is broad consensus that human cognition is adaptive. However, the vital question of how exactly this adaptivity is achieved has remained largely open. Herein, we contrast two frameworks which account for adaptive decision making, namely broad and general single-mechanism accounts vs. multi-strategy accounts. We propose and fully specify a single-mechanism model for decision making based on parallel constraint satisfaction processes (PCS-DM) and contrast it theoretically and empirically against a multi-strategy account. To achieve sufficiently sensitive tests, we rely on a multiple-measure methodology including choice, reaction time, and confidence data as well as eye-tracking. Results show that manipulating the environmental structure produces clear adaptive shifts in choice patterns - as both frameworks would predict. However, results on the process level (reaction time, confidence), in information acquisition (eye-tracking), and from cross-predicting choice consistently corroborate single-mechanisms accounts in general, and the proposed parallel constraint satisfaction model for decision making in particular. Copyright © 2014 Elsevier B.V. All rights reserved.
Improved finite-element methods for rotorcraft structures
NASA Technical Reports Server (NTRS)
Hinnant, Howard E.
1991-01-01
An overview of the research directed at improving finite-element methods for rotorcraft airframes is presented. The development of a modification to the finite element method which eliminates interelement discontinuities is covered. The following subject areas are discussed: geometric entities, interelement continuity, dependent rotational degrees of freedom, and adaptive numerical integration. This new methodology is being implemented as an anisotropic, curvilinear, p-version, beam, shell, and brick finite element program.
Computation of free energy profiles with parallel adaptive dynamics
NASA Astrophysics Data System (ADS)
Lelièvre, Tony; Rousset, Mathias; Stoltz, Gabriel
2007-04-01
We propose a formulation of an adaptive computation of free energy differences, in the adaptive biasing force or nonequilibrium metadynamics spirit, using conditional distributions of samples of configurations which evolve in time. This allows us to present a truly unifying framework for these methods, and to prove convergence results for certain classes of algorithms. From a numerical viewpoint, a parallel implementation of these methods is very natural, the replicas interacting through the reconstructed free energy. We demonstrate how to improve this parallel implementation by resorting to some selection mechanism on the replicas. This is illustrated by computations on a model system of conformational changes.
Dual-thread parallel control strategy for ophthalmic adaptive optics.
Yu, Yongxin; Zhang, Yuhua
To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope.
Dual-thread parallel control strategy for ophthalmic adaptive optics
Yu, Yongxin; Zhang, Yuhua
2015-01-01
To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope. PMID:25866498
Kinnison, Michael T.
2017-01-01
Abstract Phenotypic plasticity is often an adaptation of organisms to cope with temporally or spatially heterogenous landscapes. Like other adaptations, one would predict that different species, populations, or sexes might thus show some degree of parallel evolution of plasticity, in the form of parallel reaction norms, when exposed to analogous environmental gradients. Indeed, one might even expect parallelism of plasticity to repeatedly evolve in multiple traits responding to the same gradient, resulting in integrated parallelism of plasticity. In this study, we experimentally tested for parallel patterns of predator-mediated plasticity of size, shape, and behavior of 2 species and sexes of mosquitofish. Examination of behavioral trials indicated that the 2 species showed unique patterns of behavioral plasticity, whereas the 2 sexes in each species showed parallel responses. Fish shape showed parallel patterns of plasticity for both sexes and species, albeit males showed evidence of unique plasticity related to reproductive anatomy. Moreover, patterns of shape plasticity due to predator exposure were broadly parallel to what has been depicted for predator-mediated population divergence in other studies (slender bodies, expanded caudal regions, ventrally located eyes, and reduced male gonopodia). We did not find evidence of phenotypic plasticity in fish size for either species or sex. Hence, our findings support broadly integrated parallelism of plasticity for sexes within species and less integrated parallelism for species. We interpret these findings with respect to their potential broader implications for the interacting roles of adaptation and constraint in the evolutionary origins of parallelism of plasticity in general. PMID:29491997
Numerical simulation of aerothermal loads in hypersonic engine inlets due to shock impingement
NASA Technical Reports Server (NTRS)
Ramakrishnan, R.
1992-01-01
The effect of shock impingement on an axial corner simulating the inlet of a hypersonic vehicle engine is modeled using a finite-difference procedure. A three-dimensional dynamic grid adaptation procedure is utilized to move the grids to regions with strong flow gradients. The adaptation procedure uses a grid relocation stencil that is valid at both the interior and boundary points of the finite-difference grid. A linear combination of spatial derivatives of specific flow variables, calculated with finite-element interpolation functions, are used as adaptation measures. This computational procedure is used to study laminar and turbulent Mach 6 flows in the axial corner. The description of flow physics and qualitative measures of heat transfer distributions on cowl and strut surfaces obtained from the analysis are compared with experimental observations. Conclusions are drawn regarding the capability of the numerical scheme for enhanced modeling of high-speed compressible flows.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
NASA Astrophysics Data System (ADS)
Gan, Chee Kwan; Challacombe, Matt
2003-05-01
Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadid, John Nicolas; Elman, Howard; Shuttleworth, Robert R.
2007-04-01
In recent years, considerable effort has been placed on developing efficient and robust solution algorithms for the incompressible Navier-Stokes equations based on preconditioned Krylov methods. These include physics-based methods, such as SIMPLE, and purely algebraic preconditioners based on the approximation of the Schur complement. All these techniques can be represented as approximate block factorization (ABF) type preconditioners. The goal is to decompose the application of the preconditioner into simplified sub-systems in which scalable multi-level type solvers can be applied. In this paper we develop a taxonomy of these ideas based on an adaptation of a generalized approximate factorization of themore » Navier-Stokes system first presented in [25]. This taxonomy illuminates the similarities and differences among these preconditioners and the central role played by efficient approximation of certain Schur complement operators. We then present a parallel computational study that examines the performance of these methods and compares them to an additive Schwarz domain decomposition (DD) algorithm. Results are presented for two and three-dimensional steady state problems for enclosed domains and inflow/outflow systems on both structured and unstructured meshes. The numerical experiments are performed using MPSalsa, a stabilized finite element code.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...
2016-06-30
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Tile-based Level of Detail for the Parallel Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niski, K; Cohen, J D
Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System
NASA Astrophysics Data System (ADS)
Wang, R.; Harris, C.; Wicenec, A.
2016-07-01
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
NASA Astrophysics Data System (ADS)
Pathak, Harshavardhana S.; Shukla, Ratnesh K.
2016-08-01
A high-order adaptive finite-volume method is presented for simulating inviscid compressible flows on time-dependent redistributed grids. The method achieves dynamic adaptation through a combination of time-dependent mesh node clustering in regions characterized by strong solution gradients and an optimal selection of the order of accuracy and the associated reconstruction stencil in a conservative finite-volume framework. This combined approach maximizes spatial resolution in discontinuous regions that require low-order approximations for oscillation-free shock capturing. Over smooth regions, high-order discretization through finite-volume WENO schemes minimizes numerical dissipation and provides excellent resolution of intricate flow features. The method including the moving mesh equations and the compressible flow solver is formulated entirely on a transformed time-independent computational domain discretized using a simple uniform Cartesian mesh. Approximations for the metric terms that enforce discrete geometric conservation law while preserving the fourth-order accuracy of the two-point Gaussian quadrature rule are developed. Spurious Cartesian grid induced shock instabilities such as carbuncles that feature in a local one-dimensional contact capturing treatment along the cell face normals are effectively eliminated through upwind flux calculation using a rotated Hartex-Lax-van Leer contact resolving (HLLC) approximate Riemann solver for the Euler equations in generalized coordinates. Numerical experiments with the fifth and ninth-order WENO reconstructions at the two-point Gaussian quadrature nodes, over a range of challenging test cases, indicate that the redistributed mesh effectively adapts to the dynamic flow gradients thereby improving the solution accuracy substantially even when the initial starting mesh is non-adaptive. The high adaptivity combined with the fifth and especially the ninth-order WENO reconstruction allows remarkably sharp capture of discontinuous propagating shocks with simultaneous resolution of smooth yet complex small scale unsteady flow features to an exceptional detail.
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Cameron, Chris; Ewara, Emmanuel; Wilson, Florence R; Varu, Abhishek; Dyrda, Peter; Hutton, Brian; Ingham, Michael
2017-11-01
Adaptive trial designs present a methodological challenge when performing network meta-analysis (NMA), as data from such adaptive trial designs differ from conventional parallel design randomized controlled trials (RCTs). We aim to illustrate the importance of considering study design when conducting an NMA. Three NMAs comparing anti-tumor necrosis factor drugs for ulcerative colitis were compared and the analyses replicated using Bayesian NMA. The NMA comprised 3 RCTs comparing 4 treatments (adalimumab 40 mg, golimumab 50 mg, golimumab 100 mg, infliximab 5 mg/kg) and placebo. We investigated the impact of incorporating differences in the study design among the 3 RCTs and presented 3 alternative methods on how to convert outcome data derived from one form of adaptive design to more conventional parallel RCTs. Combining RCT results without considering variations in study design resulted in effect estimates that were biased against golimumab. In contrast, using the 3 alternative methods to convert outcome data from one form of adaptive design to a format more consistent with conventional parallel RCTs facilitated more transparent consideration of differences in study design. This approach is more likely to yield appropriate estimates of comparative efficacy when conducting an NMA, which includes treatments that use an alternative study design. RCTs based on adaptive study designs should not be combined with traditional parallel RCT designs in NMA. We have presented potential approaches to convert data from one form of adaptive design to more conventional parallel RCTs to facilitate transparent and less-biased comparisons.
NASA Astrophysics Data System (ADS)
Vera, N. C.; GMMC
2013-05-01
In this paper we present the results of macrohybrid mixed Darcian flow in porous media in a general three-dimensional domain. The global problem is solved as a set of local subproblems which are posed using a domain decomposition method. Unknown fields of local problems, velocity and pressure are approximated using mixed finite elements. For this application, a general three-dimensional domain is considered which is discretized using tetrahedra. The discrete domain is decomposed into subdomains and reformulated the original problem as a set of subproblems, communicated through their interfaces. To solve this set of subproblems, we use finite element mixed and parallel computing. The parallelization of a problem using this methodology can, in principle, to fully exploit a computer equipment and also provides results in less time, two very important elements in modeling. Referencias G.Alduncin and N.Vera-Guzmán Parallel proximal-point algorithms for mixed _nite element models of _ow in the subsurface, Commun. Numer. Meth. Engng 2004; 20:83-104 (DOI: 10.1002/cnm.647) Z. Chen, G.Huan and Y. Ma Computational Methods for Multiphase Flows in Porous Media, SIAM, Society for Industrial and Applied Mathematics, Philadelphia, 2006. A. Quarteroni and A. Valli, Numerical Approximation of Partial Differential Equations, Springer-Verlag, Berlin, 1994. Brezzi F, Fortin M. Mixed and Hybrid Finite Element Methods. Springer: New York, 1991.
Spilker, R L; de Almeida, E S; Donzelli, P S
1992-01-01
This chapter addresses computationally demanding numerical formulations in the biomechanics of soft tissues. The theory of mixtures can be used to represent soft hydrated tissues in the human musculoskeletal system as a two-phase continuum consisting of an incompressible solid phase (collagen and proteoglycan) and an incompressible fluid phase (interstitial water). We first consider the finite deformation of soft hydrated tissues in which the solid phase is represented as hyperelastic. A finite element formulation of the governing nonlinear biphasic equations is presented based on a mixed-penalty approach and derived using the weighted residual method. Fluid and solid phase deformation, velocity, and pressure are interpolated within each element, and the pressure variables within each element are eliminated at the element level. A system of nonlinear, first-order differential equations in the fluid and solid phase deformation and velocity is obtained. In order to solve these equations, the contributions of the hyperelastic solid phase are incrementally linearized, a finite difference rule is introduced for temporal discretization, and an iterative scheme is adopted to achieve equilibrium at the end of each time increment. We demonstrate the accuracy and adequacy of the procedure using a six-node, isoparametric axisymmetric element, and we present an example problem for which independent numerical solution is available. Next, we present an automated, adaptive environment for the simulation of soft tissue continua in which the finite element analysis is coupled with automatic mesh generation, error indicators, and projection methods. Mesh generation and updating, including both refinement and coarsening, for the two-dimensional examples examined in this study are performed using the finite quadtree approach. The adaptive analysis is based on an error indicator which is the L2 norm of the difference between the finite element solution and a projected finite element solution. Total stress, calculated as the sum of the solid and fluid phase stresses, is used in the error indicator. To allow the finite difference algorithm to proceed in time using an updated mesh, solution values must be transferred to the new nodal locations. This rezoning is accomplished using a projected field for the primary variables. The accuracy and effectiveness of this adaptive finite element analysis is demonstrated using a linear, two-dimensional, axisymmetric problem corresponding to the indentation of a thin sheet of soft tissue. The method is shown to effectively capture the steep gradients and to produce solutions in good agreement with independent, converged, numerical solutions.
Guo, Zongyi; Chang, Jing; Guo, Jianguo; Zhou, Jun
2018-06-01
This paper focuses on the adaptive twisting sliding mode control for the Hypersonic Reentry Vehicles (HRVs) attitude tracking issue. The HRV attitude tracking model is transformed into the error dynamics in matched structure, whereas an unmeasurable state is redefined by lumping the existing unmatched disturbance with the angular rate. Hence, an adaptive finite-time observer is used to estimate the unknown state. Then, an adaptive twisting algorithm is proposed for systems subject to disturbances with unknown bounds. The stability of the proposed observer-based adaptive twisting approach is guaranteed, and the case of noisy measurement is analyzed. Also, the developed control law avoids the aggressive chattering phenomenon of the existing adaptive twisting approaches because the adaptive gains decrease close to the disturbance once the trajectories reach the sliding surface. Finally, numerical simulations on the attitude control of the HRV are conducted to verify the effectiveness and benefit of the proposed approach. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
A new parallelization scheme for adaptive mesh refinement
Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.; ...
2016-05-06
Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
A new parallelization scheme for adaptive mesh refinement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.
Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Saini, Subhash (Technical Monitor)
1998-01-01
In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the adaption phase of FE applications oil triangular meshes and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments oil EARTH-SP2, on implementation of EARTH on the IBM SP2 with different load balancing strategies that are built into the runtime system.
McCorquodale, Peter; Ullrich, Paul; Johansen, Hans; ...
2015-09-04
We present a high-order finite-volume approach for solving the shallow-water equations on the sphere, using multiblock grids on the cubed-sphere. This approach combines a Runge--Kutta time discretization with a fourth-order accurate spatial discretization, and includes adaptive mesh refinement and refinement in time. Results of tests show fourth-order convergence for the shallow-water equations as well as for advection in a highly deformational flow. Hierarchical adaptive mesh refinement allows solution error to be achieved that is comparable to that obtained with uniform resolution of the most refined level of the hierarchy, but with many fewer operations.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barker, Andrew T.; Benson, Thomas R.; Lee, Chak Shing
ParELAG is a parallel C++ library for numerical upscaling of finite element discretizations and element-based algebraic multigrid solvers. It provides optimal complexity algorithms to build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured meshes. Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Implementation of a 3D mixing layer code on parallel computers
NASA Technical Reports Server (NTRS)
Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.
1995-01-01
This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
Array-based Hierarchical Mesh Generation in Parallel
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2015-11-03
In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less
NASA Astrophysics Data System (ADS)
Liu, Ying; Xu, Zhenhuan; Li, Yuguo
2018-04-01
We present a goal-oriented adaptive finite element (FE) modelling algorithm for 3-D magnetotelluric fields in generally anisotropic conductivity media. The model consists of a background layered structure, containing anisotropic blocks. Each block and layer might be anisotropic by assigning to them 3 × 3 conductivity tensors. The second-order partial differential equations are solved using the adaptive finite element method (FEM). The computational domain is subdivided into unstructured tetrahedral elements, which allow for complex geometries including bathymetry and dipping interfaces. The grid refinement process is guided by a global posteriori error estimator and is performed iteratively. The system of linear FE equations for electric field E is solved with a direct solver MUMPS. Then the magnetic field H can be found, in which the required derivatives are computed numerically using cubic spline interpolation. The 3-D FE algorithm has been validated by comparisons with both the 3-D finite-difference solution and 2-D FE results. Two model types are used to demonstrate the effects of anisotropy upon 3-D magnetotelluric responses: horizontal and dipping anisotropy. Finally, a 3D sea hill model is modelled to study the effect of oblique interfaces and the dipping anisotropy.
IOPA: I/O-aware parallelism adaption for parallel programs
Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei
2017-01-01
With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.
Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei
2017-01-01
With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Scalable direct Vlasov solver with discontinuous Galerkin method on unstructured mesh.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, J.; Ostroumov, P. N.; Mustapha, B.
2010-12-01
This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solvermore » comes from higher dimensions, as the computational cost increases as N{sup 2d}, where d is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.« less
Discrete maximum principle for the P1 - P0 weak Galerkin finite element approximations
NASA Astrophysics Data System (ADS)
Wang, Junping; Ye, Xiu; Zhai, Qilong; Zhang, Ran
2018-06-01
This paper presents two discrete maximum principles (DMP) for the numerical solution of second order elliptic equations arising from the weak Galerkin finite element method. The results are established by assuming an h-acute angle condition for the underlying finite element triangulations. The mathematical theory is based on the well-known De Giorgi technique adapted in the finite element context. Some numerical results are reported to validate the theory of DMP.
A parallel adaptive mesh refinement algorithm
NASA Technical Reports Server (NTRS)
Quirk, James J.; Hanebutte, Ulf R.
1993-01-01
Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.
Toward automatic finite element analysis
NASA Technical Reports Server (NTRS)
Kela, Ajay; Perucchio, Renato; Voelcker, Herbert
1987-01-01
Two problems must be solved if the finite element method is to become a reliable and affordable blackbox engineering tool. Finite element meshes must be generated automatically from computer aided design databases and mesh analysis must be made self-adaptive. The experimental system described solves both problems in 2-D through spatial and analytical substructuring techniques that are now being extended into 3-D.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wiley, J.C.
The author describes a general `hp` finite element method with adaptive grids. The code was based on the work of Oden, et al. The term `hp` refers to the method of spatial refinement (h), in conjunction with the order of polynomials used as a part of the finite element discretization (p). This finite element code seems to handle well the different mesh grid sizes occuring between abuted grids with different resolutions.
2.5D complex resistivity modeling and inversion using unstructured grids
NASA Astrophysics Data System (ADS)
Xu, Kaijun; Sun, Jie
2016-04-01
The characteristic of complex resistivity on rock and ore has been recognized by people for a long time. Generally we have used the Cole-Cole Model(CCM) to describe complex resistivity. It has been proved that the electrical anomaly of geologic body can be quantitative estimated by CCM parameters such as direct resistivity(ρ0), chargeability(m), time constant(τ) and frequency dependence(c). Thus it is very important to obtain the complex parameters of geologic body. It is difficult to approximate complex structures and terrain using traditional rectangular grid. In order to enhance the numerical accuracy and rationality of modeling and inversion, we use an adaptive finite-element algorithm for forward modeling of the frequency-domain 2.5D complex resistivity and implement the conjugate gradient algorithm in the inversion of 2.5D complex resistivity. An adaptive finite element method is applied for solving the 2.5D complex resistivity forward modeling of horizontal electric dipole source. First of all, the CCM is introduced into the Maxwell's equations to calculate the complex resistivity electromagnetic fields. Next, the pseudo delta function is used to distribute electric dipole source. Then the electromagnetic fields can be expressed in terms of the primary fields caused by layered structure and the secondary fields caused by inhomogeneities anomalous conductivity. At last, we calculated the electromagnetic fields response of complex geoelectric structures such as anticline, syncline, fault. The modeling results show that adaptive finite-element methods can automatically improve mesh generation and simulate complex geoelectric models using unstructured grids. The 2.5D complex resistivity invertion is implemented based the conjugate gradient algorithm.The conjugate gradient algorithm doesn't need to compute the sensitivity matrix but directly computes the sensitivity matrix or its transpose multiplying vector. In addition, the inversion target zones are segmented with fine grids and the background zones are segmented with big grid, the method can reduce the grid amounts of inversion, it is very helpful to improve the computational efficiency. The inversion results verify the validity and stability of conjugate gradient inversion algorithm. The results of theoretical calculation indicate that the modeling and inversion of 2.5D complex resistivity using unstructured grids are feasible. Using unstructured grids can improve the accuracy of modeling, but the large number of grids inversion is extremely time-consuming, so the parallel computation for the inversion is necessary. Acknowledgments: We thank to the support of the National Natural Science Foundation of China(41304094).
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
Surface sampling techniques for 3D object inspection
NASA Astrophysics Data System (ADS)
Shih, Chihhsiong S.; Gerhardt, Lester A.
1995-03-01
While the uniform sampling method is quite popular for pointwise measurement of manufactured parts, this paper proposes three novel sampling strategies which emphasize 3D non-uniform inspection capability. They are: (a) the adaptive sampling, (b) the local adjustment sampling, and (c) the finite element centroid sampling techniques. The adaptive sampling strategy is based on a recursive surface subdivision process. Two different approaches are described for this adaptive sampling strategy. One uses triangle patches while the other uses rectangle patches. Several real world objects were tested using these two algorithms. Preliminary results show that sample points are distributed more closely around edges, corners, and vertices as desired for many classes of objects. Adaptive sampling using triangle patches is shown to generally perform better than both uniform and adaptive sampling using rectangle patches. The local adjustment sampling strategy uses a set of predefined starting points and then finds the local optimum position of each nodal point. This method approximates the object by moving the points toward object edges and corners. In a hybrid approach, uniform points sets and non-uniform points sets, first preprocessed by the adaptive sampling algorithm on a real world object were then tested using the local adjustment sampling method. The results show that the initial point sets when preprocessed by adaptive sampling using triangle patches, are moved the least amount of distance by the subsequently applied local adjustment method, again showing the superiority of this method. The finite element sampling technique samples the centroids of the surface triangle meshes produced from the finite element method. The performance of this algorithm was compared to that of the adaptive sampling using triangular patches. The adaptive sampling with triangular patches was once again shown to be better on different classes of objects.
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Hui; Shi, Yanjun
A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Jeukens, Julie; Bittner, David; Knudsen, Rune; Bernatchez, Louis
2009-01-01
In the past 40 years, there has been increasing acceptance that variation in levels of gene expression represents a major source of evolutionary novelty. Gene expression divergence is therefore likely to be involved in the emergence of incipient species, namely, in a context of adaptive radiation. In the lake whitefish species complex (Coregonus clupeaformis), previous microarray experiments have led to the identification of candidate genes potentially implicated in the parallel evolution of the limnetic dwarf lake whitefish, which is highly distinct from the benthic normal lake whitefish in life history, morphology, metabolism, and behavior, and yet diverged from it only approximately 15,000 years before present. The aim of the present study was to address transcriptional divergence for six candidate genes among lake whitefish and European whitefish (Coregonus lavaretus) species pairs, as well as lake cisco (Coregonus artedi) and vendace (Coregonus albula). The main goal was to test the hypothesis that parallel phenotypic adaptation toward the use of the limnetic niche in coregonine fishes is accompanied by parallelism in candidate gene transcription as measured by quantitative real-time polymerase chain reaction. Results obtained for three candidate genes, whereby parallelism in expression was observed across all whitefish species pairs, provide strong support for the hypothesis that divergent natural selection plays an important role in the adaptive radiation of whitefish species. However, this parallelism in expression did not extend to cisco and vendace, thereby infirming transcriptional convergence between limnetic whitefish species and their limnetic congeners for these genes. As recently proposed (Lynch 2007a. The evolution of genetic networks by non-adaptive processes. Nat Rev Genet. 8:803-813), these results may suggest that convergent phenotypic evolution can result from nonadaptive shaping of genome architecture in independently evolved coregonine lineages.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0
NASA Technical Reports Server (NTRS)
Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine
2004-01-01
We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Wavelet Transforms in Parallel Image Processing
1994-01-27
NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Sierra Structural Dynamics User's Notes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reese, Garth M.
2015-10-19
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Similarity solutions of time-dependent relativistic radiation-hydrodynamical plane-parallel flows
NASA Astrophysics Data System (ADS)
Fukue, Jun
2018-04-01
Similarity solutions are examined for the frequency-integrated relativistic radiation-hydrodynamical flows, which are described by the comoving quantities. The flows are vertical plane-parallel time-dependent ones with a gray opacity coefficient. For adequate boundary conditions, the flows are accelerated in a somewhat homologous manner, but terminate at some singular locus, which originates from the pathological behavior in relativistic radiation moment equations truncated in finite orders.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Munday, Lynn Brendon; Day, David M.; Bunting, Gregory
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Similarity solutions of time-dependent relativistic radiation-hydrodynamical plane-parallel flows
NASA Astrophysics Data System (ADS)
Fukue, Jun
2018-06-01
Similarity solutions are examined for the frequency-integrated relativistic radiation-hydrodynamical flows, which are described by the comoving quantities. The flows are vertical plane-parallel time-dependent ones with a gray opacity coefficient. For adequate boundary conditions, the flows are accelerated in a somewhat homologous manner, but terminate at some singular locus, which originates from the pathological behavior in relativistic radiation moment equations truncated in finite orders.
Rotary Motors Actuated by Traveling Ultrasonic Flexural Waves
NASA Technical Reports Server (NTRS)
Bar-Cohen, Yoseph; Bao, Xiaoqi; Grandia, Willem
1999-01-01
Efficient miniature actuators that are compact and consume low power are needed to drive space and planetary mechanisms in future NASA missions. Ultrasonic rotary motors have the potential to meet this NASA need and they are developed as actuators for miniature telerobotic applications. These motors have emerged in commercial products but they need to be adapted for operation at the harsh space environments that include cryogenic temperatures and vacuum and also require effective analytical tools for the design of efficient motors. A finite element analytical model was developed to examine the excitation of flexural plate wave traveling in a piezoelectrically actuated rotary motor. The model uses 3D finite element and equivalent circuit models that are applied to predict the excitation frequency and modal response of the stator. This model incorporates the details of the stator including the teeth, piezoelectric ceramic, geometry, bonding layer, etc. The theoretical predictions were corroborated experimentally for the stator. In parallel, efforts have been made to determine the thermal and vacuum performance of these motors. Experiments have shown that the motor can sustain at least 230 temperature cycles from 0 C to -90 C at 7 Torr pressure significant performance change. Also, in an earlier study the motor lasted over 334 hours at -150 C and vacuum. To explore telerobotic applications for USMs a robotic arm was constructed with such motors.
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
NASA Technical Reports Server (NTRS)
Datta, Anubhav; Johnson, Wayne R.
2009-01-01
This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
3D brain tumor localization and parameter estimation using thermographic approach on GPU.
Bousselham, Abdelmajid; Bouattane, Omar; Youssfi, Mohamed; Raihani, Abdelhadi
2018-01-01
The aim of this paper is to present a GPU parallel algorithm for brain tumor detection to estimate its size and location from surface temperature distribution obtained by thermography. The normal brain tissue is modeled as a rectangular cube including spherical tumor. The temperature distribution is calculated using forward three dimensional Pennes bioheat transfer equation, it's solved using massively parallel Finite Difference Method (FDM) and implemented on Graphics Processing Unit (GPU). Genetic Algorithm (GA) was used to solve the inverse problem and estimate the tumor size and location by minimizing an objective function involving measured temperature on the surface to those obtained by numerical simulation. The parallel implementation of Finite Difference Method reduces significantly the time of bioheat transfer and greatly accelerates the inverse identification of brain tumor thermophysical and geometrical properties. Experimental results show significant gains in the computational speed on GPU and achieve a speedup of around 41 compared to the CPU. The analysis performance of the estimation based on tumor size inside brain tissue also presented. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj
2006-01-01
A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia; ...
2015-11-01
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
An adaptive approach to the physical annealing strategy for simulated annealing
NASA Astrophysics Data System (ADS)
Hasegawa, M.
2013-02-01
A new and reasonable method for adaptive implementation of simulated annealing (SA) is studied on two types of random traveling salesman problems. The idea is based on the previous finding on the search characteristics of the threshold algorithms, that is, the primary role of the relaxation dynamics in their finite-time optimization process. It is shown that the effective temperature for optimization can be predicted from the system's behavior analogous to the stabilization phenomenon occurring in the heating process starting from a quenched solution. The subsequent slow cooling near the predicted point draws out the inherent optimizing ability of finite-time SA in more straightforward manner than the conventional adaptive approach.
NASA Astrophysics Data System (ADS)
Mieloszyk, M.; Krawczuk, M.; Zak, A.; Ostachowicz, W.
2010-08-01
In this paper a concept of an adaptive wing for small-aircraft applications with an array of fibre Bragg grating (FBG) sensors has been presented and discussed. In this concept the shape of the wing can be controlled and altered thanks to the wing design and the use of integrated shape memory alloy actuators. The concept has been tested numerically by the use of the finite element method. For numerical calculations the commercial finite element package ABAQUS® has been employed. A finite element model of the wing has been prepared in order to estimate the values of the wing twisting angles and distributions of the twist for various activation scenarios. Based on the results of numerical analysis the locations and numbers of the FBG sensors have also been determined. The results of numerical calculations obtained by the authors confirmed the usefulness of the assumed wing control strategy. Based on them and the concept developed of the adaptive wing, a wing demonstration stand has been designed and built. The stand has been used to verify experimentally the performance of the adaptive wing and the usefulness of the FBG sensors for evaluation of the wing condition.
H-P adaptive methods for finite element analysis of aerothermal loads in high-speed flows
NASA Technical Reports Server (NTRS)
Chang, H. J.; Bass, J. M.; Tworzydlo, W.; Oden, J. T.
1993-01-01
The commitment to develop the National Aerospace Plane and Maneuvering Reentry Vehicles has generated resurgent interest in the technology required to design structures for hypersonic flight. The principal objective of this research and development effort has been to formulate and implement a new class of computational methodologies for accurately predicting fine scale phenomena associated with this class of problems. The initial focus of this effort was to develop optimal h-refinement and p-enrichment adaptive finite element methods which utilize a-posteriori estimates of the local errors to drive the adaptive methodology. Over the past year this work has specifically focused on two issues which are related to overall performance of a flow solver. These issues include the formulation and implementation (in two dimensions) of an implicit/explicit flow solver compatible with the hp-adaptive methodology, and the design and implementation of computational algorithm for automatically selecting optimal directions in which to enrich the mesh. These concepts and algorithms have been implemented in a two-dimensional finite element code and used to solve three hypersonic flow benchmark problems (Holden Mach 14.1, Edney shock on shock interaction Mach 8.03, and the viscous backstep Mach 4.08).
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model
NASA Astrophysics Data System (ADS)
Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin
2016-08-01
This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
Parallel Evolution of Cold Tolerance within Drosophila melanogaster
Braun, Dylan T.; Lack, Justin B.
2017-01-01
Drosophila melanogaster originated in tropical Africa before expanding into strikingly different temperate climates in Eurasia and beyond. Here, we find elevated cold tolerance in three distinct geographic regions: beyond the well-studied non-African case, we show that populations from the highlands of Ethiopia and South Africa have significantly increased cold tolerance as well. We observe greater cold tolerance in outbred versus inbred flies, but only in populations with higher inversion frequencies. Each cold-adapted population shows lower inversion frequencies than a closely-related warm-adapted population, suggesting that inversion frequencies may decrease with altitude in addition to latitude. Using the FST-based “Population Branch Excess” statistic (PBE), we found only limited evidence for parallel genetic differentiation at the scale of ∼4 kb windows, specifically between Ethiopian and South African cold-adapted populations. And yet, when we looked for single nucleotide polymorphisms (SNPs) with codirectional frequency change in two or three cold-adapted populations, strong genomic enrichments were observed from all comparisons. These findings could reflect an important role for selection on standing genetic variation leading to “soft sweeps”. One SNP showed sufficient codirectional frequency change in all cold-adapted populations to achieve experiment-wide significance: an intronic variant in the synaptic gene Prosap. Another codirectional outlier SNP, at senseless-2, had a strong association with our cold trait measurements, but in the opposite direction as predicted. More generally, proteins involved in neurotransmission were enriched as potential targets of parallel adaptation. The ability to study cold tolerance evolution in a parallel framework will enhance this classic study system for climate adaptation. PMID:27777283
Complex-energy approach to sum rules within nuclear density functional theory
Hinohara, Nobuo; Kortelainen, Markus; Nazarewicz, Witold; ...
2015-04-27
The linear response of the nucleus to an external field contains unique information about the effective interaction, correlations governing the behavior of the many-body system, and properties of its excited states. To characterize the response, it is useful to use its energy-weighted moments, or sum rules. By comparing computed sum rules with experimental values, the information content of the response can be utilized in the optimization process of the nuclear Hamiltonian or nuclear energy density functional (EDF). But the additional information comes at a price: compared to the ground state, computation of excited states is more demanding. To establish anmore » efficient framework to compute energy-weighted sum rules of the response that is adaptable to the optimization of the nuclear EDF and large-scale surveys of collective strength, we have developed a new technique within the complex-energy finite-amplitude method (FAM) based on the quasiparticle random- phase approximation. The proposed sum-rule technique based on the complex-energy FAM is a tool of choice when optimizing effective interactions or energy functionals. The method is very efficient and well-adaptable to parallel computing. As a result, the FAM formulation is especially useful when standard theorems based on commutation relations involving the nuclear Hamiltonian and external field cannot be used.« less
OpenMP performance for benchmark 2D shallow water equations using LBM
NASA Astrophysics Data System (ADS)
Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry
2018-03-01
Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
Heat transfer model and finite element formulation for simulation of selective laser melting
NASA Astrophysics Data System (ADS)
Roy, Souvik; Juha, Mario; Shephard, Mark S.; Maniatty, Antoinette M.
2017-10-01
A novel approach and finite element formulation for modeling the melting, consolidation, and re-solidification process that occurs in selective laser melting additive manufacturing is presented. Two state variables are introduced to track the phase (melt/solid) and the degree of consolidation (powder/fully dense). The effect of the consolidation on the absorption of the laser energy into the material as it transforms from a porous powder to a dense melt is considered. A Lagrangian finite element formulation, which solves the governing equations on the unconsolidated reference configuration is derived, which naturally considers the effect of the changing geometry as the powder melts without needing to update the simulation domain. The finite element model is implemented into a general-purpose parallel finite element solver. Results are presented comparing to experimental results in the literature for a single laser track with good agreement. Predictions for a spiral laser pattern are also shown.
Static shape control for adaptive wings
NASA Astrophysics Data System (ADS)
Austin, Fred; Rossi, Michael J.; van Nostrand, William; Knowles, Gareth; Jameson, Antony
1994-09-01
A theoretical method was developed and experimentally validated, to control the static shape of flexible structures by employing internal translational actuators. A finite element model of the structure, without the actuators present, is employed to obtain the multiple-input, multiple-output control-system gain matrices for actuator-load control as well as actuator-displacement control. The method is applied to the quasistatic problem of maintaining an optimum-wing cross section during various transonic-cruise flight conditions to obtain significant reductions in the shock-induced drag. Only small, potentially achievable, adaptive modifications to the profile are required. The adaptive-wing concept employs actuators as truss elements of active ribs to reshape the wing cross section by deforming the structure. Finite element analyses of an adaptive-rib model verify the controlled-structure theory. Experiments on the model were conducted, and arbitrarily selected deformed shapes were accurately achieved.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
A finite-element toolbox for the stationary Gross-Pitaevskii equation with rotation
NASA Astrophysics Data System (ADS)
Vergez, Guillaume; Danaila, Ionut; Auliac, Sylvain; Hecht, Frédéric
2016-12-01
We present a new numerical system using classical finite elements with mesh adaptivity for computing stationary solutions of the Gross-Pitaevskii equation. The programs are written as a toolbox for FreeFem++ (www.freefem.org), a free finite-element software available for all existing operating systems. This offers the advantage to hide all technical issues related to the implementation of the finite element method, allowing to easily code various numerical algorithms. Two robust and optimized numerical methods were implemented to minimize the Gross-Pitaevskii energy: a steepest descent method based on Sobolev gradients and a minimization algorithm based on the state-of-the-art optimization library Ipopt. For both methods, mesh adaptivity strategies are used to reduce the computational time and increase the local spatial accuracy when vortices are present. Different run cases are made available for 2D and 3D configurations of Bose-Einstein condensates in rotation. An optional graphical user interface is also provided, allowing to easily run predefined cases or with user-defined parameter files. We also provide several post-processing tools (like the identification of quantized vortices) that could help in extracting physical features from the simulations. The toolbox is extremely versatile and can be easily adapted to deal with different physical models.
The evolution of recombination rates in finite populations during ecological speciation.
Reeve, James; Ortiz-Barrientos, Daniel; Engelstädter, Jan
2016-10-26
Recombination can impede ecological speciation with gene flow by mixing locally adapted genotypes with maladapted migrant genotypes from a divergent population. In such a scenario, suppression of recombination can be selectively favoured. However, in finite populations evolving under the influence of random genetic drift, recombination can also facilitate adaptation by reducing Hill-Robertson interference between loci under selection. In this case, increased recombination rates can be favoured. Although these two major effects on recombination have been studied individually, their joint effect on ecological speciation with gene flow remains unexplored. Using a mathematical model, we investigated the evolution of recombination rates in two finite populations that exchange migrants while adapting to contrasting environments. Our results indicate a two-step dynamic where increased recombination is first favoured (in response to the Hill-Robertson effect), and then disfavoured, as the cost of recombining locally with maladapted migrant genotypes increases over time (the maladaptive gene flow effect). In larger populations, a stronger initial benefit for recombination was observed, whereas high migration rates intensify the long-term cost of recombination. These dynamics may have important implications for our understanding of the conditions that facilitate incipient speciation with gene flow and the evolution of recombination in finite populations. © 2016 The Author(s).
Reverse time migration by Krylov subspace reduced order modeling
NASA Astrophysics Data System (ADS)
Basir, Hadi Mahdavi; Javaherian, Abdolrahim; Shomali, Zaher Hossein; Firouz-Abadi, Roohollah Dehghani; Gholamy, Shaban Ali
2018-04-01
Imaging is a key step in seismic data processing. To date, a myriad of advanced pre-stack depth migration approaches have been developed; however, reverse time migration (RTM) is still considered as the high-end imaging algorithm. The main limitations associated with the performance cost of reverse time migration are the intensive computation of the forward and backward simulations, time consumption, and memory allocation related to imaging condition. Based on the reduced order modeling, we proposed an algorithm, which can be adapted to all the aforementioned factors. Our proposed method benefit from Krylov subspaces method to compute certain mode shapes of the velocity model computed by as an orthogonal base of reduced order modeling. Reverse time migration by reduced order modeling is helpful concerning the highly parallel computation and strongly reduces the memory requirement of reverse time migration. The synthetic model results showed that suggested method can decrease the computational costs of reverse time migration by several orders of magnitudes, compared with reverse time migration by finite element method.
Shock wave-free interface interaction
NASA Astrophysics Data System (ADS)
Frolov, Roman; Minev, Peter; Krechetnikov, Rouslan
2016-11-01
The problem of shock wave-free interface interaction has been widely studied in the context of compressible two-fluid flows using analytical, experimental, and numerical techniques. While various physical effects and possible interaction patterns for various geometries have been identified in the literature, the effects of viscosity and surface tension are usually neglected in such models. In our study, we apply a novel numerical algorithm for simulation of viscous compressible two-fluid flows with surface tension to investigate the influence of these effects on the shock-interface interaction. The method combines together the ideas from Finite Volume adaptation of invariant domains preserving algorithm for systems of hyperbolic conservation laws by Guermond and Popov and ADI parallel solver for viscous incompressible NSEs by Guermond and Minev. This combination has been further extended to a two-fluid flow case, including surface tension effects. Here we report on a quantitative study of how surface tension and viscosity affect the structure of the shock wave-free interface interaction region.
NASA Technical Reports Server (NTRS)
Fahrenthold, Eric P.; Shivarama, Ravishankar
2004-01-01
The hybrid particle-finite element method of Fahrenthold and Horban, developed for the simulation of hypervelocity impact problems, has been extended to include new formulations of the particle-element kinematics, additional constitutive models, and an improved numerical implementation. The extended formulation has been validated in three dimensional simulations of published impact experiments. The test cases demonstrate good agreement with experiment, good parallel speedup, and numerical convergence of the simulation results.
Algorithms and software for solving finite element equations on serial and parallel architectures
NASA Technical Reports Server (NTRS)
George, Alan
1989-01-01
Over the past 15 years numerous new techniques have been developed for solving systems of equations and eigenvalue problems arising in finite element computations. A package called SPARSPAK has been developed by the author and his co-workers which exploits these new methods. The broad objective of this research project is to incorporate some of this software in the Computational Structural Mechanics (CSM) testbed, and to extend the techniques for use on multiprocessor architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seefeldt, Ben; Sondak, David; Hensinger, David M.
Drekar is an application code that solves partial differential equations for fluids that can be optionally coupled to electromagnetics. Drekar solves low-mach compressible and incompressible computational fluid dynamics (CFD), compressible and incompressible resistive magnetohydrodynamics (MHD), and multiple species plasmas interacting with electromagnetic fields. Drekar discretization technology includes continuous and discontinuous finite element formulations, stabilized finite element formulations, mixed integration finite element bases (nodal, edge, face, volume) and an initial arbitrary Lagrangian Eulerian (ALE) capability. Drekar contains the implementation of the discretized physics and leverages the open source Trilinos project for both parallel solver capabilities and general finite element discretization tools.more » The code will be released open source under a BSD license. The code is used for fundamental research for simulation of fluids and plasmas on high performance computing environments.« less
A novel adaptive algorithm for 3D finite element analysis to model extracortical bone growth.
Cheong, Vee San; Blunn, Gordon W; Coathup, Melanie J; Fromme, Paul
2018-02-01
Extracortical bone growth with osseointegration of bone onto the shaft of massive bone tumour implants is an important clinical outcome for long-term implant survival. A new computational algorithm combining geometrical shape changes and bone adaptation in 3D Finite Element simulations has been developed, using a soft tissue envelope mesh, a novel concept of osteoconnectivity, and bone remodelling theory. The effects of varying the initial tissue density, spatial influence function and time step were investigated. The methodology demonstrated good correspondence to radiological results for a segmental prosthesis.
Interactive Finite Elements for General Engine Dynamics Analysis
NASA Technical Reports Server (NTRS)
Adams, M. L.; Padovan, J.; Fertis, D. G.
1984-01-01
General nonlinear finite element codes were adapted for the purpose of analyzing the dynamics of gas turbine engines. In particular, this adaptation required the development of a squeeze-film damper element software package and its implantation into a representative current generation code. The ADINA code was selected because of prior use of it and familiarity with its internal structure and logic. This objective was met and the results indicate that such use of general purpose codes is viable alternative to specialized codes for general dynamics analysis of engines.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi
1994-01-01
An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
NASA Astrophysics Data System (ADS)
Geddes, Earl Russell
The details of the low frequency sound field for a rectangular room can be studied by the use of an established analytic technique--separation of variables. The solution is straightforward and the results are well-known. A non -rectangular room has boundary conditions which are not separable and therefore other solution techniques must be used. This study shows that the finite element method can be adapted for use in the study of sound fields in arbitrary shaped enclosures. The finite element acoustics problem is formulated and the modification of a standard program, which is necessary for solving acoustic field problems, is examined. The solution of the semi-non-rectangular room problem (one where the floor and ceiling remain parallel) is carried out by a combined finite element/separation of variables approach. The solution results are used to construct the Green's function for the low frequency sound field in five rooms (or data cases): (1) a rectangular (Louden) room; (2) The smallest wall of the Louden room canted 20 degrees from normal; (3) The largest wall of the Louden room canted 20 degrees from normal; (4) both the largest and the smallest walls are canted 20 degrees; and (5) a five-sided room variation of Case 4. Case 1, the rectangular room was calculated using both the finite element method and the separation of variables technique. The results for the two methods are compared in order to access the accuracy of the finite element method models. The modal damping coefficient are calculated and the results examined. The statistics of the source and receiver average normalized RMS P('2) responses in the 80 Hz, 100 Hz, and 125 Hz one-third octave bands are developed. The receiver averaged pressure response is developed to determine the effect of the source locations on the response. Twelve source locations are examined and the results tabulated for comparison. The effect of a finite sized source is looked at briefly. Finally, the standard deviation of the spatial pressure response is studied. The results for this characteristic show that it not significantly different in any of the rooms. The conclusions of the study are that only the frequency variations of the pressure response are affected by a room's shape. Further, in general, the simplest modification of a rectangular room (i.e., changing the angle of only one of the smallest walls), produces the most pronounced decrease of the pressure response variations in the low frequency region.
Multiple-copy state discrimination: Thinking globally, acting locally
NASA Astrophysics Data System (ADS)
Higgins, B. L.; Doherty, A. C.; Bartlett, S. D.; Pryde, G. J.; Wiseman, H. M.
2011-05-01
We theoretically investigate schemes to discriminate between two nonorthogonal quantum states given multiple copies. We consider a number of state discrimination schemes as applied to nonorthogonal, mixed states of a qubit. In particular, we examine the difference that local and global optimization of local measurements makes to the probability of obtaining an erroneous result, in the regime of finite numbers of copies N, and in the asymptotic limit as N→∞. Five schemes are considered: optimal collective measurements over all copies, locally optimal local measurements in a fixed single-qubit measurement basis, globally optimal fixed local measurements, locally optimal adaptive local measurements, and globally optimal adaptive local measurements. Here an adaptive measurement is one in which the measurement basis can depend on prior measurement results. For each of these measurement schemes we determine the probability of error (for finite N) and the scaling of this error in the asymptotic limit. In the asymptotic limit, it is known analytically (and we verify numerically) that adaptive schemes have no advantage over the optimal fixed local scheme. Here we show moreover that, in this limit, the most naive scheme (locally optimal fixed local measurements) is as good as any noncollective scheme except for states with less than 2% mixture. For finite N, however, the most sophisticated local scheme (globally optimal adaptive local measurements) is better than any other noncollective scheme for any degree of mixture.
Multiple-copy state discrimination: Thinking globally, acting locally
DOE Office of Scientific and Technical Information (OSTI.GOV)
Higgins, B. L.; Pryde, G. J.; Wiseman, H. M.
2011-05-15
We theoretically investigate schemes to discriminate between two nonorthogonal quantum states given multiple copies. We consider a number of state discrimination schemes as applied to nonorthogonal, mixed states of a qubit. In particular, we examine the difference that local and global optimization of local measurements makes to the probability of obtaining an erroneous result, in the regime of finite numbers of copies N, and in the asymptotic limit as N{yields}{infinity}. Five schemes are considered: optimal collective measurements over all copies, locally optimal local measurements in a fixed single-qubit measurement basis, globally optimal fixed local measurements, locally optimal adaptive local measurements,more » and globally optimal adaptive local measurements. Here an adaptive measurement is one in which the measurement basis can depend on prior measurement results. For each of these measurement schemes we determine the probability of error (for finite N) and the scaling of this error in the asymptotic limit. In the asymptotic limit, it is known analytically (and we verify numerically) that adaptive schemes have no advantage over the optimal fixed local scheme. Here we show moreover that, in this limit, the most naive scheme (locally optimal fixed local measurements) is as good as any noncollective scheme except for states with less than 2% mixture. For finite N, however, the most sophisticated local scheme (globally optimal adaptive local measurements) is better than any other noncollective scheme for any degree of mixture.« less
A parallel variable metric optimization algorithm
NASA Technical Reports Server (NTRS)
Straeter, T. A.
1973-01-01
An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Automotive applications of superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginsberg, M.
1987-01-01
These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
Parallel processors and nonlinear structural dynamics algorithms and software
NASA Technical Reports Server (NTRS)
Belytschko, Ted
1990-01-01
Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.
Finite-volume scheme for anisotropic diffusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Es, Bram van, E-mail: bramiozo@gmail.com; FOM Institute DIFFER, Dutch Institute for Fundamental Energy Research, The Netherlands"1; Koren, Barry
In this paper, we apply a special finite-volume scheme, limited to smooth temperature distributions and Cartesian grids, to test the importance of connectivity of the finite volumes. The area of application is nuclear fusion plasma with field line aligned temperature gradients and extreme anisotropy. We apply the scheme to the anisotropic heat-conduction equation, and compare its results with those of existing finite-volume schemes for anisotropic diffusion. Also, we introduce a general model adaptation of the steady diffusion equation for extremely anisotropic diffusion problems with closed field lines.
Modeling dam-break flows using finite volume method on unstructured grid
USDA-ARS?s Scientific Manuscript database
Two-dimensional shallow water models based on unstructured finite volume method and approximate Riemann solvers for computing the intercell fluxes have drawn growing attention because of their robustness, high adaptivity to complicated geometry and ability to simulate flows with mixed regimes and di...
NASA Astrophysics Data System (ADS)
Bhakta, S.; Prajapati, R. P.
2018-02-01
The effects of Hall current and finite electrical resistivity are studied on the stability of uniformly rotating and self-gravitating anisotropic quantum plasma. The generalized Ohm's law modified by Hall current and electrical resistivity is used along with the quantum magnetohydrodynamic fluid equations. The general dispersion relation is derived using normal mode analysis and discussed in the parallel and perpendicular propagations. In the parallel propagation, the Jeans instability criterion, expression of critical Jeans wavenumber, and Jeans length are found to be independent of non-ideal effects and uniform rotation but in perpendicular propagation only rotation affects the Jeans instability criterion. The unstable gravitating mode modified by Bohm potential and the stable Alfven mode modified by non-ideal effects are obtained separately. The criterion of firehose instability remains unaffected due to the presence of non-ideal effects. In the perpendicular propagation, finite electrical resistivity and quantum pressure anisotropy modify the dispersion relation, whereas no effect of Hall current was observed in the dispersion characteristics. The Hall current, finite electrical resistivity, rotation, and quantum corrections stabilize the growth rate. The stability of the dynamical system is analyzed using the Routh-Hurwitz criterion.
NASA Technical Reports Server (NTRS)
Onsager, T. G.; Winske, D.; Thomsen, M. F.
1991-01-01
The coupling of a finite-length, field-aligned, ion beam with a uniform background plasma is investigated using one-dimensional hybrid computer simulations. The finite-length beam is used to study the interaction between the incident solar wind and ions reflected from the earth's quasi-parallel bow shock, where the reflection process may vary with time. The coupling between the reflected ions and the solar wind is relevant to ion heating at the bow shock and possibly to the formation of hot, flow anomalies and re-formation of the shock itself. Consistent with linear theory, the waves which dominate the interaction are the electromagnetic right-hand polarized resonant and nonresonant modes. However, in addition to the instability growth rates, the length of time that the waves are in contact with the beam is also an important factor in determining which wave mode will dominate the interaction. It is found that interaction will result in strong coupling, where a significant fraction of the available free energy is converted into thermal energy in a short time, provided the beam is sufficiently dense or sufficiently long.
A comparative study of serial and parallel aeroelastic computations of wings
NASA Technical Reports Server (NTRS)
Byun, Chansup; Guruswamy, Guru P.
1994-01-01
A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
Adaptive eigenspace method for inverse scattering problems in the frequency domain
NASA Astrophysics Data System (ADS)
Grote, Marcus J.; Kray, Marie; Nahum, Uri
2017-02-01
A nonlinear optimization method is proposed for the solution of inverse scattering problems in the frequency domain, when the scattered field is governed by the Helmholtz equation. The time-harmonic inverse medium problem is formulated as a PDE-constrained optimization problem and solved by an inexact truncated Newton-type iteration. Instead of a grid-based discrete representation, the unknown wave speed is projected to a particular finite-dimensional basis of eigenfunctions, which is iteratively adapted during the optimization. Truncating the adaptive eigenspace (AE) basis at a (small and slowly increasing) finite number of eigenfunctions effectively introduces regularization into the inversion and thus avoids the need for standard Tikhonov-type regularization. Both analytical and numerical evidence underpins the accuracy of the AE representation. Numerical experiments demonstrate the efficiency and robustness to missing or noisy data of the resulting adaptive eigenspace inversion method.
Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie
2014-01-01
It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Moving and adaptive grid methods for compressible flows
NASA Technical Reports Server (NTRS)
Trepanier, Jean-Yves; Camarero, Ricardo
1995-01-01
This paper describes adaptive grid methods developed specifically for compressible flow computations. The basic flow solver is a finite-volume implementation of Roe's flux difference splitting scheme or arbitrarily moving unstructured triangular meshes. The grid adaptation is performed according to geometric and flow requirements. Some results are included to illustrate the potential of the methodology.
OpenGl Visualization Tool and Library Version: 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
2010-06-22
GLVis is an OpenGL tool for visualization of finite element meshes and functions. When started without any options, GLVis starts a server, which waits for a socket connections and visualizes any recieved data. This way the results of simulations on a remote (parallel) machine can be visualized on the lical user desktop. GLVis can also be used to visualize a mesh with or without a finite element function (solution). It can run a batch sequence of commands (GLVis scripts), or display previously saved socket streams.
Finite lateral compression of an elastic plasticfibre-reinforced tube : loading solutions
NASA Astrophysics Data System (ADS)
England, A. H.; Gregory, P. W.
1999-02-01
This paper considers the finite plane-strain deformations of an elastic-plastic tubecompressed between two rigid smooth parallel plates. The tube is composed of an elastic-plasticfibre-reinforced material in which the fibres lie in planes perpendicular to the axis of the tube andreinforce the tube in the circumferential direction. The composite is assumed to be an idealmaterial which is inextensible in the fibre-direction and is incompressible. The unloading of theelastic-plastic tube will be considered in a subsequent paper.
Parallel computation in a three-dimensional elastic-plastic finite-element analysis
NASA Technical Reports Server (NTRS)
Shivakumar, K. N.; Bigelow, C. A.; Newman, J. C., Jr.
1992-01-01
A CRAY parallel processing technique called autotasking was implemented in a three-dimensional elasto-plastic finite-element code. The technique was evaluated on two CRAY supercomputers, a CRAY 2 and a CRAY Y-MP. Autotasking was implemented in all major portions of the code, except the matrix equations solver. Compiler directives alone were not able to properly multitask the code; user-inserted directives were required to achieve better performance. It was noted that the connect time, rather than wall-clock time, was more appropriate to determine speedup in multiuser environments. For a typical example problem, a speedup of 2.1 (1.8 when the solution time was included) was achieved in a dedicated environment and 1.7 (1.6 with solution time) in a multiuser environment on a four-processor CRAY 2 supercomputer. The speedup on a three-processor CRAY Y-MP was about 2.4 (2.0 with solution time) in a multiuser environment.
1982-08-01
of sensitivity with background luminance, and the finitE capacity of visual short term memory are discussed in terms of a small set of ...binocular rivalry, reflectance rivalry, Fechner’s paradox, decrease of threshold contrast with increased number of cycles in a grating pattern, hysteresis...adaptation level tuning, Weber law modulation, shift of sensitivity with background luminance, and the finite capacity of visual
3D streamers simulation in a pin to plane configuration using massively parallel computing
NASA Astrophysics Data System (ADS)
Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.
2018-03-01
This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
3-D Electromagnetic field analysis of wireless power transfer system using K computer
NASA Astrophysics Data System (ADS)
Kawase, Yoshihiro; Yamaguchi, Tadashi; Murashita, Masaya; Tsukada, Shota; Ota, Tomohiro; Yamamoto, Takeshi
2018-05-01
We analyze the electromagnetic field of a wireless power transfer system using the 3-D parallel finite element method on K computer, which is a super computer in Japan. It is clarified that the electromagnetic field of the wireless power transfer system can be analyzed in a practical time using the parallel computation on K computer, moreover, the accuracy of the loss calculation becomes better as the mesh division of the shield becomes fine.
NASA Astrophysics Data System (ADS)
Cai, Hongzhu; Hu, Xiangyun; Xiong, Bin; Zhdanov, Michael S.
2017-12-01
The induced polarization (IP) method has been widely used in geophysical exploration to identify the chargeable targets such as mineral deposits. The inversion of the IP data requires modeling the IP response of 3D dispersive conductive structures. We have developed an edge-based finite-element time-domain (FETD) modeling method to simulate the electromagnetic (EM) fields in 3D dispersive medium. We solve the vector Helmholtz equation for total electric field using the edge-based finite-element method with an unstructured tetrahedral mesh. We adopt the backward propagation Euler method, which is unconditionally stable, with semi-adaptive time stepping for the time domain discretization. We use the direct solver based on a sparse LU decomposition to solve the system of equations. We consider the Cole-Cole model in order to take into account the frequency-dependent conductivity dispersion. The Cole-Cole conductivity model in frequency domain is expanded using a truncated Padé series with adaptive selection of the center frequency of the series for early and late time. This approach can significantly increase the accuracy of FETD modeling.
Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes
Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...
2016-08-18
In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less
Load Balancing Unstructured Adaptive Grids for CFD Problems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid
1996-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
NASA Astrophysics Data System (ADS)
Plattner, A.; Maurer, H. R.; Vorloeper, J.; Dahmen, W.
2010-08-01
Despite the ever-increasing power of modern computers, realistic modelling of complex 3-D earth models is still a challenging task and requires substantial computing resources. The overwhelming majority of current geophysical modelling approaches includes either finite difference or non-adaptive finite element algorithms and variants thereof. These numerical methods usually require the subsurface to be discretized with a fine mesh to accurately capture the behaviour of the physical fields. However, this may result in excessive memory consumption and computing times. A common feature of most of these algorithms is that the modelled data discretizations are independent of the model complexity, which may be wasteful when there are only minor to moderate spatial variations in the subsurface parameters. Recent developments in the theory of adaptive numerical solvers have the potential to overcome this problem. Here, we consider an adaptive wavelet-based approach that is applicable to a large range of problems, also including nonlinear problems. In comparison with earlier applications of adaptive solvers to geophysical problems we employ here a new adaptive scheme whose core ingredients arose from a rigorous analysis of the overall asymptotically optimal computational complexity, including in particular, an optimal work/accuracy rate. Our adaptive wavelet algorithm offers several attractive features: (i) for a given subsurface model, it allows the forward modelling domain to be discretized with a quasi minimal number of degrees of freedom, (ii) sparsity of the associated system matrices is guaranteed, which makes the algorithm memory efficient and (iii) the modelling accuracy scales linearly with computing time. We have implemented the adaptive wavelet algorithm for solving 3-D geoelectric problems. To test its performance, numerical experiments were conducted with a series of conductivity models exhibiting varying degrees of structural complexity. Results were compared with a non-adaptive finite element algorithm, which incorporates an unstructured mesh to best-fitting subsurface boundaries. Such algorithms represent the current state-of-the-art in geoelectric modelling. An analysis of the numerical accuracy as a function of the number of degrees of freedom revealed that the adaptive wavelet algorithm outperforms the finite element solver for simple and moderately complex models, whereas the results become comparable for models with high spatial variability of electrical conductivities. The linear dependence of the modelling error and the computing time proved to be model-independent. This feature will allow very efficient computations using large-scale models as soon as our experimental code is optimized in terms of its implementation.
Development of an Unstructured Mesh Code for Flows About Complete Vehicles
NASA Technical Reports Server (NTRS)
Peraire, Jaime; Gupta, K. K. (Technical Monitor)
2001-01-01
This report describes the research work undertaken at the Massachusetts Institute of Technology, under NASA Research Grant NAG4-157. The aim of this research is to identify effective algorithms and methodologies for the efficient and routine solution of flow simulations about complete vehicle configurations. For over ten years we have received support from NASA to develop unstructured mesh methods for Computational Fluid Dynamics. As a result of this effort a methodology based on the use of unstructured adapted meshes of tetrahedra and finite volume flow solvers has been developed. A number of gridding algorithms, flow solvers, and adaptive strategies have been proposed. The most successful algorithms developed from the basis of the unstructured mesh system FELISA. The FELISA system has been extensively for the analysis of transonic and hypersonic flows about complete vehicle configurations. The system is highly automatic and allows for the routine aerodynamic analysis of complex configurations starting from CAD data. The code has been parallelized and utilizes efficient solution algorithms. For hypersonic flows, a version of the code which incorporates real gas effects, has been produced. The FELISA system is also a component of the STARS aeroservoelastic system developed at NASA Dryden. One of the latest developments before the start of this grant was to extend the system to include viscous effects. This required the development of viscous generators, capable of generating the anisotropic grids required to represent boundary layers, and viscous flow solvers. We show some sample hypersonic viscous computations using the developed viscous generators and solvers. Although this initial results were encouraging it became apparent that in order to develop a fully functional capability for viscous flows, several advances in solution accuracy, robustness and efficiency were required. In this grant we set out to investigate some novel methodologies that could lead to the required improvements. In particular we focused on two fronts: (1) finite element methods and (2) iterative algebraic multigrid solution techniques.
Zhang, Yao; Tang, Shengjing; Guo, Jie
2017-11-01
In this paper, a novel adaptive-gain fast super-twisting (AGFST) sliding mode attitude control synthesis is carried out for a reusable launch vehicle subject to actuator faults and unknown disturbances. According to the fast nonsingular terminal sliding mode surface (FNTSMS) and adaptive-gain fast super-twisting algorithm, an adaptive fault tolerant control law for the attitude stabilization is derived to protect against the actuator faults and unknown uncertainties. Firstly, a second-order nonlinear control-oriented model for the RLV is established by feedback linearization method. And on the basis a fast nonsingular terminal sliding mode (FNTSM) manifold is designed, which provides fast finite-time global convergence and avoids singularity problem as well as chattering phenomenon. Based on the merits of the standard super-twisting (ST) algorithm and fast reaching law with adaption, a novel adaptive-gain fast super-twisting (AGFST) algorithm is proposed for the finite-time fault tolerant attitude control problem of the RLV without any knowledge of the bounds of uncertainties and actuator faults. The important feature of the AGFST algorithm includes non-overestimating the values of the control gains and faster convergence speed than the standard ST algorithm. A formal proof of the finite-time stability of the closed-loop system is derived using the Lyapunov function technique. An estimation of the convergence time and accurate expression of convergence region are also provided. Finally, simulations are presented to illustrate the effectiveness and superiority of the proposed control scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhengyong, R.; Jingtian, T.; Changsheng, L.; Xiao, X.
2007-12-01
Although adaptive finite-element (AFE) analysis is becoming more and more focused in scientific and engineering fields, its efficient implementations are remain to be a discussed problem as its more complex procedures. In this paper, we propose a clear C++ framework implementation to show the powerful properties of Object-oriented philosophy (OOP) in designing such complex adaptive procedure. In terms of the modal functions of OOP language, the whole adaptive system is divided into several separate parts such as the mesh generation or refinement, a-posterior error estimator, adaptive strategy and the final post processing. After proper designs are locally performed on these separate modals, a connected framework of adaptive procedure is formed finally. Based on the general elliptic deferential equation, little efforts should be added in the adaptive framework to do practical simulations. To show the preferable properties of OOP adaptive designing, two numerical examples are tested. The first one is the 3D direct current resistivity problem in which the powerful framework is efficiently shown as only little divisions are added. And then, in the second induced polarization£¨IP£©exploration case, new adaptive procedure is easily added which adequately shows the strong extendibility and re-usage of OOP language. Finally we believe based on the modal framework adaptive implementation by OOP methodology, more advanced adaptive analysis system will be available in future.
Finite-size polyelectrolyte bundles at thermodynamic equilibrium
NASA Astrophysics Data System (ADS)
Sayar, M.; Holm, C.
2007-01-01
We present the results of extensive computer simulations performed on solutions of monodisperse charged rod-like polyelectrolytes in the presence of trivalent counterions. To overcome energy barriers we used a combination of parallel tempering and hybrid Monte Carlo techniques. Our results show that for small values of the electrostatic interaction the solution mostly consists of dispersed single rods. The potential of mean force between the polyelectrolyte monomers yields an attractive interaction at short distances. For a range of larger values of the Bjerrum length, we find finite-size polyelectrolyte bundles at thermodynamic equilibrium. Further increase of the Bjerrum length eventually leads to phase separation and precipitation. We discuss the origin of the observed thermodynamic stability of the finite-size aggregates.
Reflection of solar radiation by a cylindrical cloud
NASA Technical Reports Server (NTRS)
Smith, G. L.
1989-01-01
Potential applications of an analytic method for computing the solar radiation reflected by a cylindrical cloud are discussed, including studies of radiative transfer within finite clouds and evaluations of these effects on other clouds and on remote sensing problems involving finite clouds. The pattern of reflected sunlight from a cylindrical cloud as seen at a large distance has been considered and described by the bidirectional function method for finite cloud analysis, as previously studied theoretically for plane-parallel atmospheres by McKee and Cox (1974); Schmetz and Raschke (1981); and Stuhlmann et al. (1985). However, the lack of three-dimensional radiative transfer solutions for anisotropic scattering media have hampered theoretical investigations of bidirectional functions for finite clouds. The present approach permits expression of the directional variation of the radiation field as a spherical harmonic series to any desired degree and order.
Adaptive methods, rolling contact, and nonclassical friction laws
NASA Technical Reports Server (NTRS)
Oden, J. T.
1989-01-01
Results and methods on three different areas of contemporary research are outlined. These include adaptive methods, the rolling contact problem for finite deformation of a hyperelastic or viscoelastic cylinder, and non-classical friction laws for modeling dynamic friction phenomena.
Parallel computing using a Lagrangian formulation
NASA Technical Reports Server (NTRS)
Liou, May-Fun; Loh, Ching Yuen
1991-01-01
A new Lagrangian formulation of the Euler equation is adopted for the calculation of 2-D supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, a better than six times speed-up was achieved on a 8192-processor CM-2 over a single processor of a CRAY-2.
Parallel computing using a Lagrangian formulation
NASA Technical Reports Server (NTRS)
Liou, May-Fun; Loh, Ching-Yuen
1992-01-01
This paper adopts a new Lagrangian formulation of the Euler equation for the calculation of two dimensional supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, we have achieved better than six times speed-up on a 8192-processor CM-2 over a single processor of a CRAY-2.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Adaptive Wavelet Modeling of Geophysical Data
NASA Astrophysics Data System (ADS)
Plattner, A.; Maurer, H.; Dahmen, W.; Vorloeper, J.
2009-12-01
Despite the ever-increasing power of modern computers, realistic modeling of complex three-dimensional Earth models is still a challenging task and requires substantial computing resources. The overwhelming majority of current geophysical modeling approaches includes either finite difference or non-adaptive finite element algorithms, and variants thereof. These numerical methods usually require the subsurface to be discretized with a fine mesh to accurately capture the behavior of the physical fields. However, this may result in excessive memory consumption and computing times. A common feature of most of these algorithms is that the modeled data discretizations are independent of the model complexity, which may be wasteful when there are only minor to moderate spatial variations in the subsurface parameters. Recent developments in the theory of adaptive numerical solvers have the potential to overcome this problem. Here, we consider an adaptive wavelet based approach that is applicable to a large scope of problems, also including nonlinear problems. To the best of our knowledge such algorithms have not yet been applied in geophysics. Adaptive wavelet algorithms offer several attractive features: (i) for a given subsurface model, they allow the forward modeling domain to be discretized with a quasi minimal number of degrees of freedom, (ii) sparsity of the associated system matrices is guaranteed, which makes the algorithm memory efficient, and (iii) the modeling accuracy scales linearly with computing time. We have implemented the adaptive wavelet algorithm for solving three-dimensional geoelectric problems. To test its performance, numerical experiments were conducted with a series of conductivity models exhibiting varying degrees of structural complexity. Results were compared with a non-adaptive finite element algorithm, which incorporates an unstructured mesh to best fit subsurface boundaries. Such algorithms represent the current state-of-the-art in geoelectrical modeling. An analysis of the numerical accuracy as a function of the number of degrees of freedom revealed that the adaptive wavelet algorithm outperforms the finite element solver for simple and moderately complex models, whereas the results become comparable for models with spatially highly variable electrical conductivities. The linear dependency of the modeling error and the computing time proved to be model-independent. This feature will allow very efficient computations using large-scale models as soon as our experimental code is optimized in terms of its implementation.
Extent of QTL Reuse During Repeated Phenotypic Divergence of Sympatric Threespine Stickleback.
Conte, Gina L; Arnegard, Matthew E; Best, Jacob; Chan, Yingguang Frank; Jones, Felicity C; Kingsley, David M; Schluter, Dolph; Peichel, Catherine L
2015-11-01
How predictable is the genetic basis of phenotypic adaptation? Answering this question begins by estimating the repeatability of adaptation at the genetic level. Here, we provide a comprehensive estimate of the repeatability of the genetic basis of adaptive phenotypic evolution in a natural system. We used quantitative trait locus (QTL) mapping to discover genomic regions controlling a large number of morphological traits that have diverged in parallel between pairs of threespine stickleback (Gasterosteus aculeatus species complex) in Paxton and Priest lakes, British Columbia. We found that nearly half of QTL affected the same traits in the same direction in both species pairs. Another 40% influenced a parallel phenotypic trait in one lake but not the other. The remaining 10% of QTL had phenotypic effects in opposite directions in the two species pairs. Similarity in the proportional contributions of all QTL to parallel trait differences was about 0.4. Surprisingly, QTL reuse was unrelated to phenotypic effect size. Our results indicate that repeated use of the same genomic regions is a pervasive feature of parallel phenotypic adaptation, at least in sticklebacks. Identifying the causes of this pattern would aid prediction of the genetic basis of phenotypic evolution. Copyright © 2015 by the Genetics Society of America.
A generalized plasma dispersion function for electron damping in tokamak plasmas
Berry, L. A.; Jaeger, E. F.; Phillips, C. K.; ...
2016-10-14
Radio frequency wave propagation in finite temperature, magnetized plasmas exhibits a wide range of physics phenomena. The plasma response is nonlocal in space and time, and numerous modes are possible with the potential for mode conversions and transformations. Additionally, diffraction effects are important due to finite wavelength and finite-size wave launchers. Multidimensional simulations are required to describe these phenomena, but even with this complexity, the fundamental plasma response is assumed to be the uniform plasma response with the assumption that the local plasma current for a Fourier mode can be described by the Stix conductivity. But, for plasmas with non-uniformmore » magnetic fields, the wave vector itself is nonlocal. When resolved into components perpendicular (k ) and parallel (k ||) to the magnetic field, locality of the parallel component can easily be violated when the wavelength is large. The impact of this inconsistency is that estimates of the wave damping can be incorrect (typically low) due to unresolved resonances. For the case of ion cyclotron damping, this issue has already been addressed by including the effect of parallel magnetic field gradients. In this case, a modified plasma response (Z function) allows resonance broadening even when k || = 0, and this improves the convergence and accuracy of wave simulations. In our paper, we extend this formalism to include electron damping and find improved convergence and accuracy for parameters where electron damping is dominant, such as high harmonic fast wave heating in the NSTX-U tokamak, and helicon wave launch for off-axis current drive in the DIII-D tokamak.« less
Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping
2017-01-01
The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
Influence of Thermal Anisotropy on Equilibrium Stellarator Beta Limits
NASA Astrophysics Data System (ADS)
Bechtel, T. A.; Hegna, C. C.; Sovinec, C. R.
2017-10-01
The effect of anisotropic heat conduction on the upper beta limit of stellarator plasmas is studied using the nonlinear, extended MHD code NIMROD. The configuration under investigation is an l=2, M=10 torsatron with vacuum rotational transform near unity. Finite-beta plasmas are created using a volumetric heating source and temperature dependent resistivity; modeled with 22 stellarator symmetric (integer multiples of M) toroidal modes. Extended MHD simulations are then performed to generate steady state solutions that represent 3D equilibria. With increased heating, Shafranov shifts occur, and the associated break up of edge magnetic surfaces limits the achievable beta. Due to the presence of finite parallel heat conduction, pressure profiles can exist in regions of magnetic stochasticity. Here, we present results of independently varying the parallel and perpendicular thermal anisotropy. In particular, simulations show that the attained stored energy is a function of the magnitude of parallel and perpendicular thermal conduction for a given heat source, indicating that equilibrium beta limits are sensitive to anisotropic transport properties. Preliminary studies of MHD stability with non-stellarator symmetric modes, near the highest achievable beta, are also presented. Research supported by US DOE under Grant No. DE-FG02-99ER54546.
Using the Statecharts paradigm for simulation of patient flow in surgical care.
Sobolev, Boris; Harel, David; Vasilakis, Christos; Levy, Adrian
2008-03-01
Computer simulation of patient flow has been used extensively to assess the impacts of changes in the management of surgical care. However, little research is available on the utility of existing modeling techniques. The purpose of this paper is to examine the capacity of Statecharts, a system of graphical specification, for constructing a discrete-event simulation model of the perioperative process. The Statecharts specification paradigm was originally developed for representing reactive systems by extending the formalism of finite-state machines through notions of hierarchy, parallelism, and event broadcasting. Hierarchy permits subordination between states so that one state may contain other states. Parallelism permits more than one state to be active at any given time. Broadcasting of events allows one state to detect changes in another state. In the context of the peri-operative process, hierarchy provides the means to describe steps within activities and to cluster related activities, parallelism provides the means to specify concurrent activities, and event broadcasting provides the means to trigger a series of actions in one activity according to transitions that occur in another activity. Combined with hierarchy and parallelism, event broadcasting offers a convenient way to describe the interaction of concurrent activities. We applied the Statecharts formalism to describe the progress of individual patients through surgical care as a series of asynchronous updates in patient records generated in reaction to events produced by parallel finite-state machines representing concurrent clinical and managerial activities. We conclude that Statecharts capture successfully the behavioral aspects of surgical care delivery by specifying permissible chronology of events, conditions, and actions.
Zheng, Xiaoying; Li, Xiaomei; Tang, Zhen; Gong, Lulu; Wang, Dalin
2014-06-01
To study the effect of implant number and inclination on stress distribution in implant and its surrounding bone with three-dimensional finite element analysis. A special denture was made for an edentulous mandible cast to collect three-dimensional finite element data. Three three-dimensional finite element models were established as follows. Model 1: 6 paralleled implants; model 2: 4 paralleled implants; model 3: 4 implants, the two anterior implants were parallel, the two distal implants were tilted 30° distally. Among the three models, the maximum stress values found in anterior implants, posterior implants, and peri-implant bone were modle 3
NASA Technical Reports Server (NTRS)
Balas, M. J.; Kaufman, H.; Wen, J.
1985-01-01
A command generator tracker approach to model following contol of linear distributed parameter systems (DPS) whose dynamics are described on infinite dimensional Hilbert spaces is presented. This method generates finite dimensional controllers capable of exponentially stable tracking of the reference trajectories when certain ideal trajectories are known to exist for the open loop DPS; we present conditions for the existence of these ideal trajectories. An adaptive version of this type of controller is also presented and shown to achieve (in some cases, asymptotically) stable finite dimensional control of the infinite dimensional DPS.
A parallelized binary search tree
USDA-ARS?s Scientific Manuscript database
PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...
apGA: An adaptive parallel genetic algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liepins, G.E.; Baluja, S.
1991-01-01
We develop apGA, a parallel variant of the standard generational GA, that combines aggressive search with perpetual novelty, yet is able to preserve enough genetic structure to optimally solve variably scaled, non-uniform block deceptive and hierarchical deceptive problems. apGA combines elitism, adaptive mutation, adaptive exponential scaling, and temporal memory. We present empirical results for six classes of problems, including the DeJong test suite. Although we have not investigated hybrids, we note that apGA could be incorporated into other recent GA variants such as GENITOR, CHC, and the recombination stage of mGA. 12 refs., 2 figs., 2 tabs.
NASA Astrophysics Data System (ADS)
Delandmeter, Philippe; Lambrechts, Jonathan; Legat, Vincent; Vallaeys, Valentin; Naithani, Jaya; Thiery, Wim; Remacle, Jean-François; Deleersnijder, Eric
2018-03-01
The discontinuous Galerkin (DG) finite element method is well suited for the modelling, with a relatively small number of elements, of three-dimensional flows exhibiting strong velocity or density gradients. Its performance can be highly enhanced by having recourse to r-adaptivity. Here, a vertical adaptive mesh method is developed for DG finite elements. This method, originally designed for finite difference schemes, is based on the vertical diffusion of the mesh nodes, with the diffusivity controlled by the density jumps at the mesh element interfaces. The mesh vertical movement is determined by means of a conservative arbitrary Lagrangian-Eulerian (ALE) formulation. Though conservativity is naturally achieved, tracer consistency is obtained by a suitable construction of the mesh vertical velocity field, which is defined in such a way that it is fully compatible with the tracer and continuity equations at a discrete level. The vertically adaptive mesh approach is implemented in the three-dimensional version of the geophysical and environmental flow Second-generation Louvain-la-Neuve Ice-ocean Model (SLIM 3D; www.climate.be/slim). Idealised benchmarks, aimed at simulating the oscillations of a sharp thermocline, are dealt with. Then, the relevance of the vertical adaptivity technique is assessed by simulating thermocline oscillations of Lake Tanganyika. The results are compared to measured vertical profiles of temperature, showing similar stratification and outcropping events.
Genetic adaptations of the plateau zokor in high-elevation burrows.
Shao, Yong; Li, Jin-Xiu; Ge, Ri-Li; Zhong, Li; Irwin, David M; Murphy, Robert W; Zhang, Ya-Ping
2015-11-25
The plateau zokor (Myospalax baileyi) spends its entire life underground in sealed burrows. Confronting limited oxygen and high carbon dioxide concentrations, and complete darkness, they epitomize a successful physiological adaptation. Here, we employ transcriptome sequencing to explore the genetic underpinnings of their adaptations to this unique habitat. Compared to Rattus norvegicus, genes belonging to GO categories related to energy metabolism (e.g. mitochondrion and fatty acid beta-oxidation) underwent accelerated evolution in the plateau zokor. Furthermore, the numbers of positively selected genes were significantly enriched in the gene categories involved in ATPase activity, blood vessel development and respiratory gaseous exchange, functional categories that are relevant to adaptation to high altitudes. Among the 787 genes with evidence of parallel evolution, and thus identified as candidate genes, several GO categories (e.g. response to hypoxia, oxygen homeostasis and erythrocyte homeostasis) are significantly enriched, are two genes, EPAS1 and AJUBA, involved in the response to hypoxia, where the parallel evolved sites are at positions that are highly conserved in sequence alignments from multiple species. Thus, accelerated evolution of GO categories, positive selection and parallel evolution at the molecular level provide evidences to parse the genetic adaptations of the plateau zokor for living in high-elevation burrows.
Payen, Celia; Di Rienzi, Sara C; Ong, Giang T; Pogachar, Jamie L; Sanchez, Joseph C; Sunshine, Anna B; Raghuraman, M K; Brewer, Bonita J; Dunham, Maitreya J
2014-03-20
Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another.
Payen, Celia; Di Rienzi, Sara C.; Ong, Giang T.; Pogachar, Jamie L.; Sanchez, Joseph C.; Sunshine, Anna B.; Raghuraman, M. K.; Brewer, Bonita J.; Dunham, Maitreya J.
2014-01-01
Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another. PMID:24368781
Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian
2015-01-01
Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. PMID:26080903
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Guzik, Stephen M.; Gao, Xinfeng; Owen, Landon D.; ...
2015-12-20
We present a fourth-order accurate finite-volume method for solving time-dependent hyperbolic systems of conservation laws on mapped grids that are adaptively refined in space and time. Some novel considerations for formulating the semi-discrete system of equations in computational space are combined with detailed mechanisms for accommodating the adapting grids. Furthermore, these considerations ensure that conservation is maintained and that the divergence of a constant vector field is always zero (freestream-preservation property). The solution in time is advanced with a fourth-order Runge-Kutta method. A series of tests verifies that the expected accuracy is achieved in smooth flows and the solution ofmore » a Mach reflection problem demonstrates the effectiveness of the algorithm in resolving strong discontinuities.« less
Implementation of Implicit Adaptive Mesh Refinement in an Unstructured Finite-Volume Flow Solver
NASA Technical Reports Server (NTRS)
Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.
2013-01-01
This paper explores the implementation of adaptive mesh refinement in an unstructured, finite-volume solver. Unsteady and steady problems are considered. The effect on the recovery of high-order numerics is explored and the results are favorable. Important to this work is the ability to provide a path for efficient, implicit time advancement. A method using a simple refinement sensor based on undivided differences is discussed and applied to a practical problem: a shock-shock interaction on a hypersonic, inviscid double-wedge. Cases are compared to uniform grids without the use of adapted meshes in order to assess error and computational expense. Discussion of difficulties, advances, and future work prepare this method for additional research. The potential for this method in more complicated flows is described.
Disturbance observer based active and adaptive synchronization of energy resource chaotic system.
Wei, Wei; Wang, Meng; Li, Donghai; Zuo, Min; Wang, Xiaoyi
2016-11-01
In this paper, synchronization of a three-dimensional energy resource chaotic system is considered. For the sake of achieving the synchronization between the drive and response systems, two different nonlinear control approaches, i.e. active control with known parameters and adaptive control with unknown parameters, have been designed. In order to guarantee the transient performance, finite-time boundedness (FTB) and finite-time stability (FTS) are introduced in the design of active control and adaptive control, respectively. Simultaneously, in view of the existence of disturbances, a new disturbance observer is proposed to estimate the disturbance. The conditions of the asymptotic stability for the closed-loop system are obtained. Numerical simulations are provided to illustrate the proposed approaches. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)
2002-01-01
The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
Parallel implementation of an adaptive and parameter-free N-body integrator
NASA Astrophysics Data System (ADS)
Pruett, C. David; Ingham, William H.; Herman, Ralph D.
2011-05-01
Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
OpenSeesPy: Python library for the OpenSees finite element framework
NASA Astrophysics Data System (ADS)
Zhu, Minjie; McKenna, Frank; Scott, Michael H.
2018-01-01
OpenSees, an open source finite element software framework, has been used broadly in the earthquake engineering community for simulating the seismic response of structural and geotechnical systems. The framework allows users to perform finite element analysis with a scripting language and for developers to create both serial and parallel finite element computer applications as interpreters. For the last 15 years, Tcl has been the primary scripting language to which the model building and analysis modules of OpenSees are linked. To provide users with different scripting language options, particularly Python, the OpenSees interpreter interface was refactored to provide multi-interpreter capabilities. This refactoring, resulting in the creation of OpenSeesPy as a Python module, is accomplished through an abstract interface for interpreter calls with concrete implementations for different scripting languages. Through this approach, users are able to develop applications that utilize the unique features of several scripting languages while taking advantage of advanced finite element analysis models and algorithms.
Scalable Computing of the Mesh Size Effect on Modeling Damage Mechanics in Woven Armor Composites
2008-12-01
manner of a user defined material subroutine to provide overall stress increments to, the parallel LS-DYNA3D a Lagrangian explicit code used in...finite element code, as a user defined material subroutine . The ability of this subroutine to model the effect of the progressions of a select number...is added as a user defined material subroutine to parallel LS-DYNA3D. The computations of the global mesh are handled by LS-DYNA3D and are spread
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
2010-05-01
connections near the hub end, and containing up to 0.48 million degrees of freedom. The models are analyzed for scala - bility and timing for hover and...Parallel and Scalable Rotor Dynamic Analysis 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK...will enable the modeling of critical couplings that occur in hingeless and bearingless hubs with advanced flex structures. Second , it will enable the
NASA Astrophysics Data System (ADS)
de Almeida, Valmor F.
2017-07-01
A phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equation and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.
Investigation of Finite Sources through Time Reversal
NASA Astrophysics Data System (ADS)
Kremers, Simon; Brietzke, Gilbert; Igel, Heiner; Larmat, Carene; Fichtner, Andreas; Johnson, Paul A.; Huang, Lianjie
2010-05-01
Under certain conditions time reversal is a promising method to determine earthquake source characteristics without any a-priori information (except the earth model and the data). It consists of injecting flipped-in-time records from seismic stations within the model to create an approximate reverse movie of wave propagation from which the location of the hypocenter and other information might be inferred. In this study, the backward propagation is performed numerically using a parallel cartesian spectral element code. Initial tests using point source moment tensors serve as control for the adaptability of the used wave propagation algorithm. After that we investigated the potential of time reversal to recover finite source characteristics (e.g., size of ruptured area, rupture velocity etc.). We used synthetic data from the SPICE kinematic source inversion blind test initiated to investigate the performance of current kinematic source inversion approaches (http://www.spice-rtn.org/library/valid). The synthetic data set attempts to reproduce the 2000 Tottori earthquake with 33 records close to the fault. We discuss the influence of various assumptions made on the source (e.g., origin time, hypocenter, fault location, etc.), adjoint source weighting (e.g., correct for epicentral distance) and structure (uncertainty in the velocity model) on the results of the time reversal process. We give an overview about the quality of focussing of the different wavefield properties (i.e., displacements, strains, rotations, energies). Additionally, the potential to recover source properties of multiple point sources at the same time is discussed.
Puncture-and-Pull Biomechanics in the Teeth of Predatory Coelurosaurian Dinosaurs.
Torices, Angelica; Wilkinson, Ryan; Arbour, Victoria M; Ruiz-Omeñaca, Jose Ignacio; Currie, Philip J
2018-05-07
The teeth of putatively carnivorous dinosaurs are often blade-shaped with well-defined serrated cutting edges (Figure 1). These ziphodont teeth are often easily differentiated based on the morphology and density of the denticles [1, 2]. A tearing function has been proposed for theropod denticles in general [3], but the functional significance of denticle phenotypic variation has received less attention. In particular, the unusual hooked denticles found in troodontids suggest a different feeding strategy or diet compared to other small theropods. We used a two-pronged approach to investigate the function of denticle shape variation across theropods with both congruent body shapes and sizes (e.g., dromaeosaurids versus troodontids) and highly disparate body shapes and sizes (e.g., troodontids versus tyrannosaurids), using microwear and finite element analyses (Figure 1). We found that many toothed coelurosaurian theropods employed a puncture-and-pull feeding movement, in which parallel scratches form while biting down into prey and oblique scratches form as the head is pulled backward with the jaws closed. In finite element simulations, theropod teeth had the lowest stresses when bite forces were aligned with the oblique family of microwear scratches. Different denticle morphologies performed differently under a variety of simulated biting angles: Dromaeosaurus and Saurornitholestes were well-adapted for handling struggling prey, whereas troodontid teeth were more likely to fail at non-optimal bite angles. Troodontids may have favored softer, smaller, or immobile prey. Copyright © 2018 Elsevier Ltd. All rights reserved.
Crashworthiness simulations with DYNA3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schauer, D.A.; Hoover, C.G.; Kay, G.J.
1996-04-01
Current progress in parallel algorithm research and applications in vehicle crash simulation is described for the explicit, finite element algorithms in DYNA3D. Problem partitioning methods and parallel algorithms for contact at material interfaces are the two challenging algorithm research problems that are addressed. Two prototype parallel contact algorithms have been developed for treating the cases of local and arbitrary contact. Demonstration problems for local contact are crashworthiness simulations with 222 locally defined contact surfaces and a vehicle/barrier collision modeled with arbitrary contact. A simulation of crash tests conducted for a vehicle impacting a U-channel small sign post embedded in soilmore » has been run on both the serial and parallel versions of DYNA3D. A significant reduction in computational time has been observed when running these problems on the parallel version. However, to achieve maximum efficiency, complex problems must be appropriately partitioned, especially when contact dominates the computation.« less
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
NASA Astrophysics Data System (ADS)
Ji, X.; Shen, C.
2017-12-01
Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
NASA Astrophysics Data System (ADS)
Schoups, G.; Vrugt, J. A.; Fenicia, F.; van de Giesen, N. C.
2010-10-01
Conceptual rainfall-runoff models have traditionally been applied without paying much attention to numerical errors induced by temporal integration of water balance dynamics. Reliance on first-order, explicit, fixed-step integration methods leads to computationally cheap simulation models that are easy to implement. Computational speed is especially desirable for estimating parameter and predictive uncertainty using Markov chain Monte Carlo (MCMC) methods. Confirming earlier work of Kavetski et al. (2003), we show here that the computational speed of first-order, explicit, fixed-step integration methods comes at a cost: for a case study with a spatially lumped conceptual rainfall-runoff model, it introduces artificial bimodality in the marginal posterior parameter distributions, which is not present in numerically accurate implementations of the same model. The resulting effects on MCMC simulation include (1) inconsistent estimates of posterior parameter and predictive distributions, (2) poor performance and slow convergence of the MCMC algorithm, and (3) unreliable convergence diagnosis using the Gelman-Rubin statistic. We studied several alternative numerical implementations to remedy these problems, including various adaptive-step finite difference schemes and an operator splitting method. Our results show that adaptive-step, second-order methods, based on either explicit finite differencing or operator splitting with analytical integration, provide the best alternative for accurate and efficient MCMC simulation. Fixed-step or adaptive-step implicit methods may also be used for increased accuracy, but they cannot match the efficiency of adaptive-step explicit finite differencing or operator splitting. Of the latter two, explicit finite differencing is more generally applicable and is preferred if the individual hydrologic flux laws cannot be integrated analytically, as the splitting method then loses its advantage.
Durham extremely large telescope adaptive optics simulation platform.
Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard
2007-03-01
Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
High-Performance High-Order Simulation of Wave and Plasma Phenomena
NASA Astrophysics Data System (ADS)
Klockner, Andreas
This thesis presents results aiming to enhance and broaden the applicability of the discontinuous Galerkin ("DG") method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable numerical properties for the treatment of hyperbolic conservation laws. In a first part, I examine progress that can be made on implementation aspects of DG. In adapting the method to mass-market massively parallel computation hardware in the form of graphics processors ("GPUs"), I obtain an increase in computation performance per unit of cost by more than an order of magnitude over conventional processor architectures. Key to this advance is a recipe that adapts DG to a variety of hardware through automated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU and greater than 3 TFlops/s on a 16-GPU cluster in simulations of electromagnetics problems in three dimensions. I further briefly discuss the solver infrastructure that makes this possible. In the second part of the thesis, I introduce a number of new numerical methods whose motivation is partly rooted in the opportunity created by GPU-DG: First, I construct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity, helps stabilize DG computations in gas dynamics and a number of other fields. Second, I describe my pursuit of a method that allows the simulation of rarefied plasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, with a focus on applicability to DG discretizations of time-dependent problems.
NASA Astrophysics Data System (ADS)
Burtyka, Filipp
2018-01-01
The paper considers algorithms for finding diagonalizable and non-diagonalizable roots (so called solvents) of monic arbitrary unilateral second-order matrix polynomial over prime finite field. These algorithms are based on polynomial matrices (lambda-matrices). This is an extension of existing general methods for computing solvents of matrix polynomials over field of complex numbers. We analyze how techniques for complex numbers can be adapted for finite field and estimate asymptotic complexity of the obtained algorithms.
Free Mesh Method: fundamental conception, algorithms and accuracy study
YAGAWA, Genki
2011-01-01
The finite element method (FEM) has been commonly employed in a variety of fields as a computer simulation method to solve such problems as solid, fluid, electro-magnetic phenomena and so on. However, creation of a quality mesh for the problem domain is a prerequisite when using FEM, which becomes a major part of the cost of a simulation. It is natural that the concept of meshless method has evolved. The free mesh method (FMM) is among the typical meshless methods intended for particle-like finite element analysis of problems that are difficult to handle using global mesh generation, especially on parallel processors. FMM is an efficient node-based finite element method that employs a local mesh generation technique and a node-by-node algorithm for the finite element calculations. In this paper, FMM and its variation are reviewed focusing on their fundamental conception, algorithms and accuracy. PMID:21558752
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Sargent, Jeff Scott
1988-01-01
A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
NASA Astrophysics Data System (ADS)
Re, B.; Dobrzynski, C.; Guardone, A.
2017-07-01
A novel strategy to solve the finite volume discretization of the unsteady Euler equations within the Arbitrary Lagrangian-Eulerian framework over tetrahedral adaptive grids is proposed. The volume changes due to local mesh adaptation are treated as continuous deformations of the finite volumes and they are taken into account by adding fictitious numerical fluxes to the governing equation. This peculiar interpretation enables to avoid any explicit interpolation of the solution between different grids and to compute grid velocities so that the Geometric Conservation Law is automatically fulfilled also for connectivity changes. The solution on the new grid is obtained through standard ALE techniques, thus preserving the underlying scheme properties, such as conservativeness, stability and monotonicity. The adaptation procedure includes node insertion, node deletion, edge swapping and points relocation and it is exploited both to enhance grid quality after the boundary movement and to modify the grid spacing to increase solution accuracy. The presented approach is assessed by three-dimensional simulations of steady and unsteady flow fields. The capability of dealing with large boundary displacements is demonstrated by computing the flow around the translating infinite- and finite-span NACA 0012 wing moving through the domain at the flight speed. The proposed adaptive scheme is applied also to the simulation of a pitching infinite-span wing, where the bi-dimensional character of the flow is well reproduced despite the three-dimensional unstructured grid. Finally, the scheme is exploited in a piston-induced shock-tube problem to take into account simultaneously the large deformation of the domain and the shock wave. In all tests, mesh adaptation plays a crucial role.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.
Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
NASA Astrophysics Data System (ADS)
Marras, Simone; Giraldo, Frank
2015-04-01
The prediction of extreme weather sufficiently ahead of its occurrence impacts society as a whole and coastal communities specifically (e.g. Hurricane Sandy that impacted the eastern seaboard of the U.S. in the fall of 2012). With the final goal of solving hurricanes at very high resolution and numerical accuracy, we have been developing the Non-hydrostatic Unified Model of the Atmosphere (NUMA) to solve the Euler and Navier-Stokes equations by arbitrary high-order element-based Galerkin methods on massively parallel computers. NUMA is a unified model with respect to the following criteria: (a) it is based on unified numerics in that element-based Galerkin methods allow the user to choose between continuous (spectral elements, CG) or discontinuous Galerkin (DG) methods and from a large spectrum of time integrators, (b) it is unified across scales in that it can solve flow in limited-area mode (flow in a box) or in global mode (flow on the sphere). NUMA is the dynamical core that powers the U.S. Naval Research Laboratory's next-generation global weather prediction system NEPTUNE (Navy's Environmental Prediction sysTem Utilizing the NUMA corE). Because the solution of the Euler equations by high order methods is prone to instabilities that must be damped in some way, we approach the problem of stabilization via an adaptive Large Eddy Simulation (LES) scheme meant to treat such instabilities by modeling the sub-grid scale features of the flow. The novelty of our effort lies in the extension to high order spectral elements for low Mach number stratified flows of a method that was originally designed for low order, adaptive finite elements in the high Mach number regime [1]. The Euler equations are regularized by means of a dynamically adaptive stress tensor that is proportional to the residual of the unperturbed equations. Its effect is close to none where the solution is sufficiently smooth, whereas it increases elsewhere, with a direct contribution to the stabilization of the otherwise oscillatory solution. As a first step toward the Large Eddy Simulation of a hurricane, we verify the model via a high-order and high resolution idealized simulation of deep convection on the sphere. References [1] M. Nazarov and J. Hoffman (2013) Residual-based artificial viscosity for simulation of turbulent compressible flow using adaptive finite element methods Int. J. Numer. Methods Fluids, 71:339-357
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
Explicit finite-difference simulation of optical integrated devices on massive parallel computers.
Sterkenburgh, T; Michels, R M; Dress, P; Franke, H
1997-02-20
An explicit method for the numerical simulation of optical integrated circuits by means of the finite-difference time-domain (FDTD) method is presented. This method, based on an explicit solution of Maxwell's equations, is well established in microwave technology. Although the simulation areas are small, we verified the behavior of three interesting problems, especially nonparaxial problems, with typical aspects of integrated optical devices. Because numerical losses are within acceptable limits, we suggest the use of the FDTD method to achieve promising quantitative simulation results.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers
NASA Technical Reports Server (NTRS)
Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.
2014-01-01
This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
2012-05-22
tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition, we use x2f mpi – a Fortran library...for parallel vector-valued function evaluation (used with ISAT in this context) – to efficiently redistribute the chemistry workload among the...Constrained-Equilibrium (RCCE) method, and tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fijany, A.; Milman, M.; Redding, D.
1994-12-31
In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
A PARALLEL LEAST-SQUARES FINITE ELEMENT METHOD FOR INCOMPRESSIBLE FLOWS. (R825200)
The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Error analysis and correction of discrete solutions from finite element codes
NASA Technical Reports Server (NTRS)
Thurston, G. A.; Stein, P. A.; Knight, N. F., Jr.; Reissner, J. E.
1984-01-01
Many structures are an assembly of individual shell components. Therefore, results for stresses and deflections from finite element solutions for each shell component should agree with the equations of shell theory. This paper examines the problem of applying shell theory to the error analysis and the correction of finite element results. The general approach to error analysis and correction is discussed first. Relaxation methods are suggested as one approach to correcting finite element results for all or parts of shell structures. Next, the problem of error analysis of plate structures is examined in more detail. The method of successive approximations is adapted to take discrete finite element solutions and to generate continuous approximate solutions for postbuckled plates. Preliminary numerical results are included.
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions
NASA Astrophysics Data System (ADS)
Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey
2014-01-01
The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
An O(log sup 2 N) parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix
NASA Technical Reports Server (NTRS)
Swarztrauber, Paul N.
1989-01-01
An O(log sup 2 N) parallel algorithm is presented for computing the eigenvalues of a symmetric tridiagonal matrix using a parallel algorithm for computing the zeros of the characteristic polynomial. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The exact behavior of the polynomials at the interval endpoints is used to eliminate the usual problems induced by finite precision arithmetic.
Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1
NASA Technical Reports Server (NTRS)
Wright, Michael J.; White, Todd; Mangini, Nancy
2009-01-01
Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.
Jueterbock, A; Franssen, S U; Bergmann, N; Gu, J; Coyer, J A; Reusch, T B H; Bornberg-Bauer, E; Olsen, J L
2016-11-01
Populations distributed across a broad thermal cline are instrumental in addressing adaptation to increasing temperatures under global warming. Using a space-for-time substitution design, we tested for parallel adaptation to warm temperatures along two independent thermal clines in Zostera marina, the most widely distributed seagrass in the temperate Northern Hemisphere. A North-South pair of populations was sampled along the European and North American coasts and exposed to a simulated heatwave in a common-garden mesocosm. Transcriptomic responses under control, heat stress and recovery were recorded in 99 RNAseq libraries with ~13 000 uniquely annotated, expressed genes. We corrected for phylogenetic differentiation among populations to discriminate neutral from adaptive differentiation. The two southern populations recovered faster from heat stress and showed parallel transcriptomic differentiation, as compared with northern populations. Among 2389 differentially expressed genes, 21 exceeded neutral expectations and were likely involved in parallel adaptation to warm temperatures. However, the strongest differentiation following phylogenetic correction was between the three Atlantic populations and the Mediterranean population with 128 of 4711 differentially expressed genes exceeding neutral expectations. Although adaptation to warm temperatures is expected to reduce sensitivity to heatwaves, the continued resistance of seagrass to further anthropogenic stresses may be impaired by heat-induced downregulation of genes related to photosynthesis, pathogen defence and stress tolerance. © 2016 John Wiley & Sons Ltd.
Distributed Finite Element Analysis Using a Transputer Network
NASA Technical Reports Server (NTRS)
Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy
1989-01-01
The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
Modeling of heterogeneous elastic materials by the multiscale hp-adaptive finite element method
NASA Astrophysics Data System (ADS)
Klimczak, Marek; Cecot, Witold
2018-01-01
We present an enhancement of the multiscale finite element method (MsFEM) by combining it with the hp-adaptive FEM. Such a discretization-based homogenization technique is a versatile tool for modeling heterogeneous materials with fast oscillating elasticity coefficients. No assumption on periodicity of the domain is required. In order to avoid direct, so-called overkill mesh computations, a coarse mesh with effective stiffness matrices is used and special shape functions are constructed to account for the local heterogeneities at the micro resolution. The automatic adaptivity (hp-type at the macro resolution and h-type at the micro resolution) increases efficiency of computation. In this paper details of the modified MsFEM are presented and a numerical test performed on a Fichera corner domain is presented in order to validate the proposed approach.
Finite Element Analysis of Adaptive-Stiffening and Shape-Control SMA Hybrid Composites
NASA Technical Reports Server (NTRS)
Gao, Xiu-Jie; Turner, Travis L.; Burton, Deborah; Brinson, L. Catherine
2005-01-01
The usage of shape memory materials has extended rapidly to many fields, including medical devices, actuators, composites, structures and MEMS devices. For these various applications, shape memory alloys (SMAs) are available in various forms: bulk, wire, ribbon, thin film, and porous. In this work, the focus is on SMA hybrid composites with adaptive-stiffening or morphing functions. These composites are created by using SMA ribbons or wires embedded in a polymeric based composite panel/beam. Adaptive stiffening or morphing is activated via selective resistance heating or uniform thermal loads. To simulate the thermomechanical behavior of these composites, a SMA model was implemented using ABAQUS user element interface and finite element simulations of the systems were studied. Several examples are presented which show that the implemented model can be a very useful design and simulation tool for SMA hybrid composites.
Using Multithreading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Bailey, David H. (Technical Monitor)
1998-01-01
In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes. The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the question phase of FE applications on triangular meshes, and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments on EARTH-SP2, an implementation of EARTH on the IBM SP2, with different load balancing strategies that are built into the runtime system.
NASA Technical Reports Server (NTRS)
Steger, J. L.; Dougherty, F. C.; Benek, J. A.
1983-01-01
A mesh system composed of multiple overset body-conforming grids is described for adapting finite-difference procedures to complex aircraft configurations. In this so-called 'chimera mesh,' a major grid is generated about a main component of the configuration and overset minor grids are used to resolve all other features. Methods for connecting overset multiple grids and modifications of flow-simulation algorithms are discussed. Computational tests in two dimensions indicate that the use of multiple overset grids can simplify the task of grid generation without an adverse effect on flow-field algorithms and computer code complexity.
NASA Technical Reports Server (NTRS)
Duque, Earl P. N.; Biswas, Rupak; Strawn, Roger C.
1995-01-01
This paper summarizes a method that solves both the three dimensional thin-layer Navier-Stokes equations and the Euler equations using overset structured and solution adaptive unstructured grids with applications to helicopter rotor flowfields. The overset structured grids use an implicit finite-difference method to solve the thin-layer Navier-Stokes/Euler equations while the unstructured grid uses an explicit finite-volume method to solve the Euler equations. Solutions on a helicopter rotor in hover show the ability to accurately convect the rotor wake. However, isotropic subdivision of the tetrahedral mesh rapidly increases the overall problem size.
A novel adaptive finite time controller for bilateral teleoperation system
NASA Astrophysics Data System (ADS)
Wang, Ziwei; Chen, Zhang; Liang, Bin; Zhang, Bo
2018-03-01
Most bilateral teleoperation researches focus on the system stability within time-delays. However, practical teleoperation tasks require high performances besides system stability, such as convergence rate and accuracy. This paper investigates bilateral teleoperation controller design with transient performances. To ensure the transient performances and system stability simultaneously, an adaptive non-singular fast terminal mode controller is proposed to achieve practical finite-time stability considering system uncertainties and time delays. In addition, a novel switching scheme is introduced, in which way the singularity problem of conventional terminal sliding manifold is avoided. Finally, numerical simulations demonstrate the effectiveness and validity of the proposed method.
Biology-Culture Co-evolution in Finite Populations.
de Boer, Bart; Thompson, Bill
2018-01-19
Language is the result of two concurrent evolutionary processes: biological and cultural inheritance. An influential evolutionary hypothesis known as the moving target problem implies inherent limitations on the interactions between our two inheritance streams that result from a difference in pace: the speed of cultural evolution is thought to rule out cognitive adaptation to culturally evolving aspects of language. We examine this hypothesis formally by casting it as as a problem of adaptation in time-varying environments. We present a mathematical model of biology-culture co-evolution in finite populations: a generalisation of the Moran process, treating co-evolution as coupled non-independent Markov processes, providing a general formulation of the moving target hypothesis in precise probabilistic terms. Rapidly varying culture decreases the probability of biological adaptation. However, we show that this effect declines with population size and with stronger links between biology and culture: in realistically sized finite populations, stochastic effects can carry cognitive specialisations to fixation in the face of variable culture, especially if the effects of those specialisations are amplified through cultural evolution. These results support the view that language arises from interactions between our two major inheritance streams, rather than from one primary evolutionary process that dominates another.
NASA Astrophysics Data System (ADS)
Ng, C. S.; Rosenberg, D.; Pouquet, A.; Germaschewski, K.; Bhattacharjee, A.
2009-04-01
A recently developed spectral-element adaptive refinement incompressible magnetohydrodynamic (MHD) code [Rosenberg, Fournier, Fischer, Pouquet, J. Comp. Phys. 215, 59-80 (2006)] is applied to simulate the problem of MHD island coalescence instability (\\ci) in two dimensions. \\ci is a fundamental MHD process that can produce sharp current layers and subsequent reconnection and heating in a high-Lundquist number plasma such as the solar corona [Ng and Bhattacharjee, Phys. Plasmas, 5, 4028 (1998)]. Due to the formation of thin current layers, it is highly desirable to use adaptively or statically refined grids to resolve them, and to maintain accuracy at the same time. The output of the spectral-element static adaptive refinement simulations are compared with simulations using a finite difference method on the same refinement grids, and both methods are compared to pseudo-spectral simulations with uniform grids as baselines. It is shown that with the statically refined grids roughly scaling linearly with effective resolution, spectral element runs can maintain accuracy significantly higher than that of the finite difference runs, in some cases achieving close to full spectral accuracy.
Heating and Large Scale Dynamics of the Solar Corona
NASA Technical Reports Server (NTRS)
Schnack, Dalton D.
2000-01-01
The effort was concentrated in the areas: coronal heating mechanism, unstructured adaptive grid algorithms, numerical modeling of magnetic reconnection in the MRX experiment: effect of toroidal magnetic field and finite pressure, effect of OHMIC heating and vertical magnetic field, effect of dynamic MESH adaption.
Language Model Combination and Adaptation Using Weighted Finite State Transducers
NASA Technical Reports Server (NTRS)
Liu, X.; Gales, M. J. F.; Hieronymus, J. L.; Woodland, P. C.
2010-01-01
In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaption may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequences
NASA Astrophysics Data System (ADS)
Simoni, L.; Secchi, S.; Schrefler, B. A.
2008-12-01
This paper analyses the numerical difficulties commonly encountered in solving fully coupled numerical models and proposes a numerical strategy apt to overcome them. The proposed procedure is based on space refinement and time adaptivity. The latter, which in mainly studied here, is based on the use of a finite element approach in the space domain and a Discontinuous Galerkin approximation within each time span. Error measures are defined for the jump of the solution at each time station. These constitute the parameters allowing for the time adaptivity. Some care is however, needed for a useful definition of the jump measures. Numerical tests are presented firstly to demonstrate the advantages and shortcomings of the method over the more traditional use of finite differences in time, then to assess the efficiency of the proposed procedure for adapting the time step. The proposed method reveals its efficiency and simplicity to adapt the time step in the solution of coupled field problems.
Finite elements: Theory and application
NASA Technical Reports Server (NTRS)
Dwoyer, D. L. (Editor); Hussaini, M. Y. (Editor); Voigt, R. G. (Editor)
1988-01-01
Recent advances in FEM techniques and applications are discussed in reviews and reports presented at the ICASE/LaRC workshop held in Hampton, VA in July 1986. Topics addressed include FEM approaches for partial differential equations, mixed FEMs, singular FEMs, FEMs for hyperbolic systems, iterative methods for elliptic finite-element equations on general meshes, mathematical aspects of FEMS for incompressible viscous flows, and gradient weighted moving finite elements in two dimensions. Consideration is given to adaptive flux-corrected FEM transport techniques for CFD, mixed and singular finite elements and the field BEM, p and h-p versions of the FEM, transient analysis methods in computational dynamics, and FEMs for integrated flow/thermal/structural analysis.
Ouellet, Jean A.; Richards, Corey; Sardar, Zeeshan M.; Giannitsios, Demetri; Noiseux, Nicholas; Strydom, Willem S.; Reindl, Rudy; Jarzem, Peter; Arlet, Vincent; Steffen, Thomas
2013-01-01
The ideal treatment for unstable thoracolumbar fractures remains controversial with posterior reduction and stabilization, anterior reduction and stabilization, combined posterior and anterior reduction and stabilization, and even nonoperative management advocated. Short segment posterior osteosynthesis of these fractures has less comorbidities compared with the other operative approaches but settles into kyphosis over time. Biomechanical comparison of the divergent bridge construct versus the parallel tension band construct was performed for anteriorly destabilized T11–L1 spine segments using three different models: (1) finite element analysis (FEA), (2) a synthetic model, and (3) a human cadaveric model. Outcomes measured were construct stiffness and ultimate failure load. Our objective was to determine if the divergent pedicle screw bridge construct would provide more resistance to kyphotic deforming forces. All three modalities showed greater stiffness with the divergent bridge construct. The FEA calculated a stiffness of 21.6 N/m for the tension band construct versus 34.1 N/m for the divergent bridge construct. The synthetic model resulted in a mean stiffness of 17.3 N/m for parallel tension band versus 20.6 N/m for the divergent bridge (p = 0.03), whereas the cadaveric model had an average stiffness of 15.2 N/m in the parallel tension band compared with 18.4 N/m for the divergent bridge (p = 0.02). Ultimate failure load with the cadaveric model was found to be 622 N for the divergent bridge construct versus 419 N (p = 0.15) for the parallel tension band construct. This study confirms our clinical experience that the short posterior divergent bridge construct provides greater stiffness for the management of unstable thoracolumbar fractures. PMID:24436856
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains
Bunting, Gregory; Prakash, Arun; Walsh, Timothy; ...
2018-01-26
Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Parallel Ellipsoidal Perfectly Matched Layers for Acoustic Helmholtz Problems on Exterior Domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bunting, Gregory; Prakash, Arun; Walsh, Timothy
Exterior acoustic problems occur in a wide range of applications, making the finite element analysis of such problems a common practice in the engineering community. Various methods for truncating infinite exterior domains have been developed, including absorbing boundary conditions, infinite elements, and more recently, perfectly matched layers (PML). PML are gaining popularity due to their generality, ease of implementation, and effectiveness as an absorbing boundary condition. PML formulations have been developed in Cartesian, cylindrical, and spherical geometries, but not ellipsoidal. In addition, the parallel solution of PML formulations with iterative solvers for the solution of the Helmholtz equation, and howmore » this compares with more traditional strategies such as infinite elements, has not been adequately investigated. In this study, we present a parallel, ellipsoidal PML formulation for acoustic Helmholtz problems. To faciliate the meshing process, the ellipsoidal PML layer is generated with an on-the-fly mesh extrusion. Though the complex stretching is defined along ellipsoidal contours, we modify the Jacobian to include an additional mapping back to Cartesian coordinates in the weak formulation of the finite element equations. This allows the equations to be solved in Cartesian coordinates, which is more compatible with existing finite element software, but without the necessity of dealing with corners in the PML formulation. Herein we also compare the conditioning and performance of the PML Helmholtz problem with infinite element approach that is based on high order basis functions. On a set of representative exterior acoustic examples, we show that high order infinite element basis functions lead to an increasing number of Helmholtz solver iterations, whereas for PML the number of iterations remains constant for the same level of accuracy. Finally, this provides an additional advantage of PML over the infinite element approach.« less
Unstructured Adaptive Grid Computations on an Array of SMPs
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.
1996-01-01
Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2
NASA Technical Reports Server (NTRS)
Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian
2015-07-01
Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. © 2015 The Author(s). Evolution published by Wiley Periodicals, Inc. on behalf of The Society for the Study of Evolution.
Mechanisms mediating parallel action monitoring in fronto-striatal circuits.
Beste, Christian; Ness, Vanessa; Lukas, Carsten; Hoffmann, Rainer; Stüwe, Sven; Falkenstein, Michael; Saft, Carsten
2012-08-01
Flexible response adaptation and the control of conflicting information play a pivotal role in daily life. Yet, little is known about the neuronal mechanisms mediating parallel control of these processes. We examined these mechanisms using a multi-methodological approach that integrated data from event-related potentials (ERPs) with structural MRI data and source localisation using sLORETA. Moreover, we calculated evoked wavelet oscillations. We applied this multi-methodological approach in healthy subjects and patients in a prodromal phase of a major basal ganglia disorder (i.e., Huntington's disease), to directly focus on fronto-striatal networks. Behavioural data indicated, especially the parallel execution of conflict monitoring and flexible response adaptation was modulated across the examined cohorts. When both processes do not co-incide a high integrity of fronto-striatal loops seems to be dispensable. The neurophysiological data suggests that conflict monitoring (reflected by the N2 ERP) and working memory processes (reflected by the P3 ERP) differentially contribute to this pattern of results. Flexible response adaptation under the constraint of high conflict processing affected the N2 and P3 ERP, as well as their delta frequency band oscillations. Yet, modulatory effects were strongest for the N2 ERP and evoked wavelet oscillations in this time range. The N2 ERPs were localized in the anterior cingulate cortex (BA32, BA24). Modulations of the P3 ERP were localized in parietal areas (BA7). In addition, MRI-determined caudate head volume predicted modulations in conflict monitoring, but not working memory processes. The results show how parallel conflict monitoring and flexible adaptation of action is mediated via fronto-striatal networks. While both, response monitoring and working memory processes seem to play a role, especially response selection processes and ACC-basal ganglia networks seem to be the driving force in mediating parallel conflict monitoring and flexible adaptation of actions. Copyright © 2012 Elsevier Inc. All rights reserved.
Alvioli, M.; Baum, R.L.
2016-01-01
We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
NASA Astrophysics Data System (ADS)
McGovern, S.; Kollet, S. J.; Buerger, C. M.; Schwede, R. L.; Podlaha, O. G.
2017-12-01
In the context of sedimentary basins, we present a model for the simulation of the movement of ageological formation (layers) during the evolution of the basin through sedimentation and compactionprocesses. Assuming a single phase saturated porous medium for the sedimentary layers, the modelfocuses on the tracking of the layer interfaces, through the use of the level set method, as sedimentationdrives fluid-flow and reduction of pore space by compaction. On the assumption of Terzaghi's effectivestress concept, the coupling of the pore fluid pressure to the motion of interfaces in 1-D is presented inMcGovern, et.al (2017) [1] .The current work extends the spatial domain to 3-D, though we maintain the assumption ofvertical effective stress to drive the compaction. The idealized geological evolution is conceptualized asthe motion of interfaces between rock layers, whose paths are determined by the magnitude of a speedfunction in the direction normal to the evolving layer interface. The speeds normal to the interface aredependent on the change in porosity, determined through an effective stress-based compaction law,such as the exponential Athy's law. Provided with the speeds normal to the interface, the level setmethod uses an advection equation to evolve a potential function, whose zero level set defines theinterface. Thus, the moving layer geometry influences the pore pressure distribution which couplesback to the interface speeds. The flexible construction of the speed function allows extension, in thefuture, to other terms to represent different physical processes, analogous to how the compaction rulerepresents material deformation.The 3-D model is implemented using the generic finite element method framework Deal II,which provides tools, building on p4est and interfacing to PETSc, for the massively parallel distributedsolution to the model equations [2]. Experiments are being run on the Juelich Supercomputing Center'sJureca cluster. [1] McGovern, et.al. (2017). Novel basin modelling concept for simulating deformation from mechanical compaction using level sets. Computational Geosciences, SI:ECMOR XV, 1-14.[2] Bangerth, et. al. (2011). Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Transactions on Mathematical Software (TOMS), 38(2):14.
Finite Element Analysis of Magnetoelastic Plate Problems.
1981-08-01
deformation and in the incremental large deformation analysis, respectively. The classical Kirchhoff assumption of the undeformable normal to the midsurface is...current density , is constant across the thickness of the plate and is parallel to the midsurface of the plate; (2) the normal component of the
NASA Technical Reports Server (NTRS)
Heflinger, L. O.
1970-01-01
In holographic interferometry a small movement of apparatus between exposures causes the background of the reconstructed scene to be covered with interference fringes approximately parallel to each other. The three-dimensional quality of the holographic image is allowable since a mathematical model will give the location of the fringes.
Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C
2011-01-01
Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, Ruel H.; Imbriale, William A.; Jacobi, Nathan; Liewer, Paulett C.; Lockhart, Thomas G.; Lyzenga, Gregory A.; Lyons, James R.; Manshadi, Farzin; Patterson, Jean E.
1988-01-01
A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18).
OWL: A scalable Monte Carlo simulation suite for finite-temperature study of materials
NASA Astrophysics Data System (ADS)
Li, Ying Wai; Yuk, Simuck F.; Cooper, Valentino R.; Eisenbach, Markus; Odbadrakh, Khorgolkhuu
The OWL suite is a simulation package for performing large-scale Monte Carlo simulations. Its object-oriented, modular design enables it to interface with various external packages for energy evaluations. It is therefore applicable to study the finite-temperature properties for a wide range of systems: from simple classical spin models to materials where the energy is evaluated by ab initio methods. This scheme not only allows for the study of thermodynamic properties based on first-principles statistical mechanics, it also provides a means for massive, multi-level parallelism to fully exploit the capacity of modern heterogeneous computer architectures. We will demonstrate how improved strong and weak scaling is achieved by employing novel, parallel and scalable Monte Carlo algorithms, as well as the applications of OWL to a few selected frontier materials research problems. This research was supported by the Office of Science of the Department of Energy under contract DE-AC05-00OR22725.
A solution to neural field equations by a recurrent neural network method
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2012-09-01
Neural field equations (NFE) are used to model the activity of neurons in the brain, it is introduced from a single neuron 'integrate-and-fire model' starting point. The neural continuum is spatially discretized for numerical studies, and the governing equations are modeled as a system of ordinary differential equations. In this article the recurrent neural network approach is used to solve this system of ODEs. This consists of a technique developed by combining the standard numerical method of finite-differences with the Hopfield neural network. The architecture of the net, energy function, updating equations, and algorithms are developed for the NFE model. A Hopfield Neural Network is then designed to minimize the energy function modeling the NFE. Results obtained from the Hopfield-finite-differences net show excellent performance in terms of accuracy and speed. The parallelism nature of the Hopfield approaches may make them easier to implement on fast parallel computers and give them the speed advantage over the traditional methods.
NASA Astrophysics Data System (ADS)
Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid
2017-12-01
Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ˜600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ˜0.25 s/excitation source.
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
Reconnection in Three Dimensions
NASA Technical Reports Server (NTRS)
Hesse, Michael
1999-01-01
Analyzing the qualitative three-dimensional magnetic structure of a plasmoid, we were led to reconsider the concept of magnetic reconnection from a general point of view. The properties of relatively simple magnetic field models provide a strong preference for one of two definitions of magnetic reconnection that exist in the literature. Any concept of magnetic reconnection defined in terms of magnetic topology seems naturally restricted to cases where the magnetic field vanishes somewhere in the nonideal (diffusion) region. The main part of this paper is concerned with magnetic reconnection in nonvanishing magnetic fields (finite-B reconnection), which has attracted less attention in the past. We show that the electric field component parallel to the magnetic field plays a crucial physical role in finite-B reconnection, and we present two theorems involving the former. The first states a necessary and sufficient condition on the parallel electric field for global reconnection to occur. Here the term "global" means the generic case where the breakdown of magnetic connection occurs for plasma elements that stay outside the nonideal region. The second theorem relates the change of magnetic helicity to the parallel electric field for cases where the electric field vanishes at large distances. That these results provide new insight into three-dimensional reconnection processes is illustrated in terms of the plasmoid configuration, which was our starting point.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Shadid, John N.; Sala, Marzio
In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Auto-adaptive finite element meshes
NASA Technical Reports Server (NTRS)
Richter, Roland; Leyland, Penelope
1995-01-01
Accurate capturing of discontinuities within compressible flow computations is achieved by coupling a suitable solver with an automatic adaptive mesh algorithm for unstructured triangular meshes. The mesh adaptation procedures developed rely on non-hierarchical dynamical local refinement/derefinement techniques, which hence enable structural optimization as well as geometrical optimization. The methods described are applied for a number of the ICASE test cases are particularly interesting for unsteady flow simulations.
NASA Astrophysics Data System (ADS)
Popov, Igor; Sukov, Sergey
2018-02-01
A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
Numerical Modelling of Foundation Slabs with use of Schur Complement Method
NASA Astrophysics Data System (ADS)
Koktan, Jiří; Brožovský, Jiří
2017-10-01
The paper discusses numerical modelling of foundation slabs with use of advanced numerical approaches, which are suitable for parallel processing. The solution is based on the Finite Element Method with the slab-type elements. The subsoil is modelled with use of Winklertype contact model (as an alternative a multi-parameter model can be used). The proposed modelling approach uses the Schur Complement method to speed-up the computations of the problem. The method is based on a special division of the analyzed model to several substructures. It adds some complexity to the numerical procedures, especially when subsoil models are used inside the finite element method solution. In other hand, this method makes possible a fast solution of large models but it introduces further problems to the process. Thus, the main aim of this paper is to verify that such method can be successfully used for this type of problem. The most suitable finite elements will be discussed, there will be also discussion related to finite element mesh and limitations of its construction for such problem. The core approaches of the implementation of the Schur Complement Method for this type of the problem will be also presented. The proposed approach was implemented in the form of a computer program, which will be also briefly introduced. There will be also presented results of example computations, which prove the speed-up of the solution - there will be shown important speed-up of solution even in the case of on-parallel processing and the ability of bypass size limitations of numerical models with use of the discussed approach.
Overcoming Challenges in Kinetic Modeling of Magnetized Plasmas and Vacuum Electronic Devices
NASA Astrophysics Data System (ADS)
Omelchenko, Yuri; Na, Dong-Yeop; Teixeira, Fernando
2017-10-01
We transform the state-of-the art of plasma modeling by taking advantage of novel computational techniques for fast and robust integration of multiscale hybrid (full particle ions, fluid electrons, no displacement current) and full-PIC models. These models are implemented in 3D HYPERS and axisymmetric full-PIC CONPIC codes. HYPERS is a massively parallel, asynchronous code. The HYPERS solver does not step fields and particles synchronously in time but instead executes local variable updates (events) at their self-adaptive rates while preserving fundamental conservation laws. The charge-conserving CONPIC code has a matrix-free explicit finite-element (FE) solver based on a sparse-approximate inverse (SPAI) algorithm. This explicit solver approximates the inverse FE system matrix (``mass'' matrix) using successive sparsity pattern orders of the original matrix. It does not reduce the set of Maxwell's equations to a vector-wave (curl-curl) equation of second order but instead utilizes the standard coupled first-order Maxwell's system. We discuss the ability of our codes to accurately and efficiently account for multiscale physical phenomena in 3D magnetized space and laboratory plasmas and axisymmetric vacuum electronic devices.
The Development of a Finite Volume Method for Modeling Sound in Coastal Ocean Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Long, Wen; Yang, Zhaoqing; Copping, Andrea E.
: As the rapid growth of marine renewable energy and off-shore wind energy, there have been concerns that the noises generated from construction and operation of the devices may interfere marine animals’ communication. In this research, a underwater sound model is developed to simulate sound prorogation generated by marine-hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite volume and finite difference methods are developed to solve the 3D Helmholtz equation of sound propagation in the coastal environment. For finite volume method, the grid system consists of triangular grids in horizontal plane and sigma-layers in vertical dimension. A 3Dmore » sparse matrix solver with complex coefficients is formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method is applied to efficiently solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model is then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generated by human activities in a range-dependent setting, such as offshore wind energy platform constructions and tidal stream turbines. As a proof of concept, initial validation of the finite difference solver is presented for two coastal wedge problems. Validation of finite volume method will be reported separately.« less
Investigation of Liner Characteristics in the NASA Langley Curved Duct Test Rig
NASA Technical Reports Server (NTRS)
Gerhold, Carl H.; Brown, Martha C.; Watson, Willie R.; Jones, Michael G.
2007-01-01
The Curved Duct Test Rig (CDTR), which is designed to investigate propagation of sound in a duct with flow, has been developed at NASA Langley Research Center. The duct incorporates an adaptive control system to generate a tone in the duct at a specific frequency with a target Sound Pressure Level and a target mode shape. The size of the duct, the ability to isolate higher order modes, and the ability to modify the duct configuration make this rig unique among experimental duct acoustics facilities. An experiment is described in which the facility performance is evaluated by measuring the sound attenuation by a sample duct liner. The liner sample comprises one wall of the liner test section. Sound in tones from 500 to 2400 Hz, with modes that are parallel to the liner surface of order 0 to 5, and that are normal to the liner surface of order 0 to 2, can be generated incident on the liner test section. Tests are performed in which sound is generated without axial flow in the duct and with flow at a Mach number of 0.275. The attenuation of the liner is determined by comparing the sound power in a hard wall section downstream of the liner test section to the sound power in a hard wall section upstream of the liner test section. These experimentally determined attenuations are compared to numerically determined attenuations calculated by means of a finite element analysis code. The code incorporates liner impedance values educed from measured data from the NASA Langley Grazing Incidence Tube, a test rig that is used for investigating liner performance with flow and with (0,0) mode incident grazing. The analytical and experimental results compare favorably, indicating the validity of the finite element method and demonstrating that finite element prediction tools can be used together with experiment to characterize the liner attenuation.
Simulation of quasi-static hydraulic fracture propagation in porous media with XFEM
NASA Astrophysics Data System (ADS)
Juan-Lien Ramirez, Alina; Neuweiler, Insa; Löhnert, Stefan
2015-04-01
Hydraulic fracturing is the injection of a fracking fluid at high pressures into the underground. Its goal is to create and expand fracture networks to increase the rock permeability. It is a technique used, for example, for oil and gas recovery and for geothermal energy extraction, since higher rock permeability improves production. Many physical processes take place when it comes to fracking; rock deformation, fluid flow within the fractures, as well as into and through the porous rock. All these processes are strongly coupled, what makes its numerical simulation rather challenging. We present a 2D numerical model that simulates the hydraulic propagation of an embedded fracture quasi-statically in a poroelastic, fully saturated material. Fluid flow within the porous rock is described by Darcy's law and the flow within the fracture is approximated by a parallel plate model. Additionally, the effect of leak-off is taken into consideration. The solid component of the porous medium is assumed to be linear elastic and the propagation criteria are given by the energy release rate and the stress intensity factors [1]. The used numerical method for the spatial discretization is the eXtended Finite Element Method (XFEM) [2]. It is based on the standard Finite Element Method, but introduces additional degrees of freedom and enrichment functions to describe discontinuities locally in a system. Through them the geometry of the discontinuity (e.g. a fracture) becomes independent of the mesh allowing it to move freely through the domain without a mesh-adapting step. With this numerical model we are able to simulate hydraulic fracture propagation with different initial fracture geometries and material parameters. Results from these simulations will also be presented. References [1] D. Gross and T. Seelig. Fracture Mechanics with an Introduction to Micromechanics. Springer, 2nd edition, (2011) [2] T. Belytschko and T. Black. Elastic crack growth in finite elements with minimal remeshing. Int. J. Numer. Meth. Engng. 45, 601-620, (1999)
Adaptive control of a manipulator with a flexible link
NASA Technical Reports Server (NTRS)
Yang, Y. P.; Gibson, J. S.
1988-01-01
An adaptive controller for a manipulator with one rigid link and one flexible link is presented. The performance and robustness of the controller are demonstrated by numerical simulation results. In the simulations, the manipulator moves in a gravitational field and a finite element model represents the flexible link.
An adaptive finite element method for the inequality-constrained Reynolds equation
NASA Astrophysics Data System (ADS)
Gustafsson, Tom; Rajagopal, Kumbakonam R.; Stenberg, Rolf; Videman, Juha
2018-07-01
We present a stabilized finite element method for the numerical solution of cavitation in lubrication, modeled as an inequality-constrained Reynolds equation. The cavitation model is written as a variable coefficient saddle-point problem and approximated by a residual-based stabilized method. Based on our recent results on the classical obstacle problem, we present optimal a priori estimates and derive novel a posteriori error estimators. The method is implemented as a Nitsche-type finite element technique and shown in numerical computations to be superior to the usually applied penalty methods.
NASA Astrophysics Data System (ADS)
Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.
2017-01-01
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
Electrostatic Estimation of Intercalant Jump-Diffusion Barriers Using Finite-Size Ion Models.
Zimmermann, Nils E R; Hannah, Daniel C; Rong, Ziqin; Liu, Miao; Ceder, Gerbrand; Haranczyk, Maciej; Persson, Kristin A
2018-02-01
We report on a scheme for estimating intercalant jump-diffusion barriers that are typically obtained from demanding density functional theory-nudged elastic band calculations. The key idea is to relax a chain of states in the field of the electrostatic potential that is averaged over a spherical volume using different finite-size ion models. For magnesium migrating in typical intercalation materials such as transition-metal oxides, we find that the optimal model is a relatively large shell. This data-driven result parallels typical assumptions made in models based on Onsager's reaction field theory to quantitatively estimate electrostatic solvent effects. Because of its efficiency, our potential of electrostatics-finite ion size (PfEFIS) barrier estimation scheme will enable rapid identification of materials with good ionic mobility.
Recent advances in PDF modeling of turbulent reacting flows
NASA Technical Reports Server (NTRS)
Leonard, Andrew D.; Dai, F.
1995-01-01
This viewgraph presentation concludes that a Monte Carlo probability density function (PDF) solution successfully couples with an existing finite volume code; PDF solution method applied to turbulent reacting flows shows good agreement with data; and PDF methods must be run on parallel machines for practical use.
ZZ-Type a posteriori error estimators for adaptive boundary element methods on a curve☆
Feischl, Michael; Führer, Thomas; Karkulik, Michael; Praetorius, Dirk
2014-01-01
In the context of the adaptive finite element method (FEM), ZZ-error estimators named after Zienkiewicz and Zhu (1987) [52] are mathematically well-established and widely used in practice. In this work, we propose and analyze ZZ-type error estimators for the adaptive boundary element method (BEM). We consider weakly singular and hyper-singular integral equations and prove, in particular, convergence of the related adaptive mesh-refining algorithms. Throughout, the theoretical findings are underlined by numerical experiments. PMID:24748725
Magnitude of parallel pseudo potential in a magnetosonic shock wave
NASA Astrophysics Data System (ADS)
Ohsawa, Yukiharu
2018-05-01
The parallel pseudo potential F, which is the integral of the parallel electric field along the magnetic field, in a large-amplitude magnetosonic pulse (shock wave) is theoretically studied. Particle simulations revealed in the late 1990's that the product of the elementary charge and F can be much larger than the electron temperature in shock waves, i.e., the parallel electric field can be quite strong. However, no theory was presented for this unexpected result. This paper first revisits the small-amplitude theory for F and then investigates the parallel pseudo potential F in large-amplitude pulses based on the two-fluid model with finite thermal pressures. It is found that the magnitude of F in a shock wave is determined by the wave amplitude, the electron temperature, and the kinetic energy of an ion moving with the Alfvén speed. This theoretically obtained expression for F is nearly identical to the empirical relation for F discovered in the previous simulation work.
Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...
2015-01-05
There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver
NASA Technical Reports Server (NTRS)
Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)
2002-01-01
The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
NASA Technical Reports Server (NTRS)
Sharma, Naveen
1992-01-01
In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
A massively parallel adaptive scheme for melt migration in geodynamics computations
NASA Astrophysics Data System (ADS)
Dannberg, Juliane; Heister, Timo; Grove, Ryan
2016-04-01
Melt generation and migration are important processes for the evolution of the Earth's interior and impact the global convection of the mantle. While they have been the subject of numerous investigations, the typical time and length-scales of melt transport are vastly different from global mantle convection, which determines where melt is generated. This makes it difficult to study mantle convection and melt migration in a unified framework. In addition, modelling magma dynamics poses the challenge of highly non-linear and spatially variable material properties, in particular the viscosity. We describe our extension of the community mantle convection code ASPECT that adds equations describing the behaviour of silicate melt percolating through and interacting with a viscously deforming host rock. We use the original compressible formulation of the McKenzie equations, augmented by an equation for the conservation of energy. This approach includes both melt migration and melt generation with the accompanying latent heat effects, and it incorporates the individual compressibilities of the solid and the fluid phase. For this, we derive an accurate and stable Finite Element scheme that can be combined with adaptive mesh refinement. This is particularly advantageous for this type of problem, as the resolution can be increased in mesh cells where melt is present and viscosity gradients are high, whereas a lower resolution is sufficient in regions without melt. Together with a high-performance, massively parallel implementation, this allows for high resolution, 3d, compressible, global mantle convection simulations coupled with melt migration. Furthermore, scalable iterative linear solvers are required to solve the large linear systems arising from the discretized system. Finally, we present benchmarks and scaling tests of our solver up to tens of thousands of cores, show the effectiveness of adaptive mesh refinement when applied to melt migration and compare the compressible and incompressible formulation. We then apply our software to large-scale 3d simulations of melting and melt transport in mantle plumes interacting with the lithosphere. Our model of magma dynamics provides a framework for modelling processes on different scales and investigating links between processes occurring in the deep mantle and melt generation and migration. The presented implementation is available online under an Open Source license together with an extensive documentation.
Parallel software support for computational structural mechanics
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1987-01-01
The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.
Transmission Index Research of Parallel Manipulators Based on Matrix Orthogonal Degree
NASA Astrophysics Data System (ADS)
Shao, Zhu-Feng; Mo, Jiao; Tang, Xiao-Qiang; Wang, Li-Ping
2017-11-01
Performance index is the standard of performance evaluation, and is the foundation of both performance analysis and optimal design for the parallel manipulator. Seeking the suitable kinematic indices is always an important and challenging issue for the parallel manipulator. So far, there are extensive studies in this field, but few existing indices can meet all the requirements, such as simple, intuitive, and universal. To solve this problem, the matrix orthogonal degree is adopted, and generalized transmission indices that can evaluate motion/force transmissibility of fully parallel manipulators are proposed. Transmission performance analysis of typical branches, end effectors, and parallel manipulators is given to illustrate proposed indices and analysis methodology. Simulation and analysis results reveal that proposed transmission indices possess significant advantages, such as normalized finite (ranging from 0 to 1), dimensionally homogeneous, frame-free, intuitive and easy to calculate. Besides, proposed indices well indicate the good transmission region and relativity to the singularity with better resolution than the traditional local conditioning index, and provide a novel tool for kinematic analysis and optimal design of fully parallel manipulators.
NASA Astrophysics Data System (ADS)
Teng, Y. C.; Kelly, D.; Li, Y.; Zhang, K.
2016-02-01
A new state-of-the-art model (the Fully Adaptive Storm Tide model, FAST) for the prediction of storm surges over complex landscapes is presented. The FAST model is based on the conservation form of the full non-linear depth-averaged long wave equations. The equations are solved via an explicit finite volume scheme with interfacial fluxes being computed via Osher's approximate Riemann solver. Geometric source terms are treated in a high order manner that is well-balanced. The numerical solution technique has been chosen to enable the accurate simulation of wetting and drying over complex topography. Another important feature of the FAST model is the use of a simple underlying Cartesian mesh with tree-based static and dynamic adaptive mesh refinement (AMR). This permits the simulation of unsteady flows over varying landscapes (including localized features such as canals) by locally increasing (or relaxing) grid resolution in a dynamic fashion. The use of (dynamic) AMR lowers the computational cost of the storm surge model whilst retaining high resolution (and thus accuracy) where and when it is required. In additional, the FAST model has been designed to execute in a parallel computational environment with localized time-stepping. The FAST model has already been carefully verified against a series of benchmark type problems (Kelly et al. 2015). Here we present two simulations of the storm tide due to Hurricane Ike(2008) and Hurricane Sandy (2012). The model incorporates high resolution LIDAR data for the major portion of the New York City. Results compare favorably with water elevations measured by NOAA tidal gauges, by mobile sensors deployed and high water marks collected by the USGS.
Compressible magma/mantle dynamics: 3-D, adaptive simulations in ASPECT
NASA Astrophysics Data System (ADS)
Dannberg, Juliane; Heister, Timo
2016-12-01
Melt generation and migration are an important link between surface processes and the thermal and chemical evolution of the Earth's interior. However, their vastly different timescales make it difficult to study mantle convection and melt migration in a unified framework, especially for 3-D global models. And although experiments suggest an increase in melt volume of up to 20 per cent from the depth of melt generation to the surface, previous computations have neglected the individual compressibilities of the solid and the fluid phase. Here, we describe our extension of the finite element mantle convection code ASPECT that adds melt generation and migration. We use the original compressible formulation of the McKenzie equations, augmented by an equation for the conservation of energy. Applying adaptive mesh refinement to this type of problems is particularly advantageous, as the resolution can be increased in areas where melt is present and viscosity gradients are high, whereas a lower resolution is sufficient in regions without melt. Together with a high-performance, massively parallel implementation, this allows for high-resolution, 3-D, compressible, global mantle convection simulations coupled with melt migration. We evaluate the functionality and potential of this method using a series of benchmarks and model setups, compare results of the compressible and incompressible formulation, and show the effectiveness of adaptive mesh refinement when applied to melt migration. Our model of magma dynamics provides a framework for modelling processes on different scales and investigating links between processes occurring in the deep mantle and melt generation and migration. This approach could prove particularly useful applied to modelling the generation of komatiites or other melts originating in greater depths. The implementation is available in the Open Source ASPECT repository.
NASA Technical Reports Server (NTRS)
Hsu, Andrew T.; Lytle, John K.
1989-01-01
An algebraic adaptive grid scheme based on the concept of arc equidistribution is presented. The scheme locally adjusts the grid density based on gradients of selected flow variables from either finite difference or finite volume calculations. A user-prescribed grid stretching can be specified such that control of the grid spacing can be maintained in areas of known flowfield behavior. For example, the grid can be clustered near a wall for boundary layer resolution and made coarse near the outer boundary of an external flow. A grid smoothing technique is incorporated into the adaptive grid routine, which is found to be more robust and efficient than the weight function filtering technique employed by other researchers. Since the present algebraic scheme requires no iteration or solution of differential equations, the computer time needed for grid adaptation is trivial, making the scheme useful for three-dimensional flow problems. Applications to two- and three-dimensional flow problems show that a considerable improvement in flowfield resolution can be achieved by using the proposed adaptive grid scheme. Although the scheme was developed with steady flow in mind, it is a good candidate for unsteady flow computations because of its efficiency.
Application of adaptive gridding to magnetohydrodynamic flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schnack, D.D.; Lotatti, I.; Satyanarayana, P.
1996-12-31
The numerical simulation of the primitive, three-dimensional, time-dependent, resistive MHD equations on an unstructured, adaptive poloidal mesh using the TRIM code has been reported previously. The toroidal coordinate is approximated pseudo-spectrally with finite Fourier series and Fast-Fourier Transforms. The finite-volume algorithm preserves the magnetic field as solenoidal to round-off error, and also conserves mass, energy, and magnetic flux exactly. A semi-implicit method is used to allow for large time steps on the unstructured mesh. This is important for tokamak calculations where the relevant time scale is determined by the poloidal Alfven time. This also allows the viscosity to be treatedmore » implicitly. A conjugate-gradient method with pre-conditioning is used for matrix inversion. Applications to the growth and saturation of ideal instabilities in several toroidal fusion systems has been demonstrated. Recently we have concentrated on the details of the mesh adaption algorithm used in TRIM. We present several two-dimensional results relating to the use of grid adaptivity to track the evolution of hydrodynamic and MHD structures. Examples of plasma guns, opening switches, and supersonic flow over a magnetized sphere are presented. Issues relating to mesh adaption criteria are discussed.« less
1+1 dimensional compactifications of string theory.
Goheer, Naureen; Kleban, Matthew; Susskind, Leonard
2004-05-14
We argue that stable, maximally symmetric compactifications of string theory to 1+1 dimensions are in conflict with holography. In particular, the finite horizon entropies of the Rindler wedge in 1+1 dimensional Minkowski and anti-de Sitter space, and of the de Sitter horizon in any dimension, are inconsistent with the symmetries of these spaces. The argument parallels one made recently by the same authors, in which we demonstrated the incompatibility of the finiteness of the entropy and the symmetries of de Sitter space in any dimension. If the horizon entropy is either infinite or zero, the conflict is resolved.
Finite-element analysis and modal testing of a rotating wind turbine
NASA Astrophysics Data System (ADS)
Carne, T. G.; Lobitz, D. W.; Nord, A. R.; Watson, R. A.
1982-10-01
A finite element procedure, which includes geometric stiffening, and centrifugal and Coriolis terms resulting from the use of a rotating coordinate system, was developed to compute the mode shapes and frequencies of rotating structures. Special applications of this capability was made to Darrieus, vertical axis wind turbines. In a parallel development effort, a technique for the modal testing of a rotating vertical axis wind turbine is established to measure modal parameters directly. Results from the predictive and experimental techniques for the modal frequencies and mode shapes are compared over a wide range of rotational speeds.
Finite element analysis and modal testing of a rotating wind turbine
NASA Astrophysics Data System (ADS)
Carne, T. G.; Lobitz, D. W.; Nord, A. R.; Watson, R. A.
A finite element procedure, which includes geometric stiffening, and centrifugal and Coriolis terms resulting from the use of a rotating coordinate system, has been developed to compute the mode shapes and frequencies of rotating structures. Special application of this capability has been made to Darrieus, vertical axis wind turbines. In a parallel development effort, a technique for the modal testing of a rotating vertical axis wind turbine has been established to measure modal parameters directly. Results from the predictive and experimental techniques for the modal frequencies and mode shapes are compared over a wide range of rotational speeds.
Features of sound propagation through and stability of a finite shear layer
NASA Technical Reports Server (NTRS)
Koutsoyannis, S. P.
1976-01-01
The plane wave propagation, the stability and the rectangular duct mode problems of a compressible inviscid linearly sheared parallel, but otherwise homogeneous flow, are shown to be governed by Whittaker's equation. The exact solutions for the perturbation quantities are essentially Whittaker M-functions. A number of known results are obtained as limiting cases of exact solutions. For the compressible finite thickness shear layer it is shown that no resonances and no critical angles exist for all Mach numbers, frequencies and shear layer velocity profile slopes except in the singular case of the vortex sheet.
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Self-sustained radial oscillating flows between parallel disks
NASA Astrophysics Data System (ADS)
Mochizuki, S.; Yang, W.-J.
1985-05-01
It is pointed out that radial flow between parallel circular disks is of interest in a number of physical systems such as hydrostatic air bearings, radial diffusers, and VTOL aircraft with centrally located downward-positioned jets. The present investigation is concerned with the problem of instability in radial flow between parallel disks. A time-dependent numerical study and experiments are conducted. Both approaches reveal the nucleation, growth, migration, and decay of annular separation bubbles (i.e. vortex or recirculation zones) in the laminar-flow region. A finite-difference technique is utilized to solve the full unsteady vorticity transport equation in the theoretical procedure, while the flow patterns in the experiments are visualized with the aid of dye-injection, hydrogen-bubble, and paraffin-mist methods. It is found that the separation and reattachment of shear layers in the radial flow through parallel disks are unsteady phenomena. The sequence of nucleation, growth, migration, and decay of the vortices is self-sustained.
Steffen, Michael; Curtis, Sean; Kirby, Robert M; Ryan, Jennifer K
2008-01-01
Streamline integration of fields produced by computational fluid mechanics simulations is a commonly used tool for the investigation and analysis of fluid flow phenomena. Integration is often accomplished through the application of ordinary differential equation (ODE) integrators--integrators whose error characteristics are predicated on the smoothness of the field through which the streamline is being integrated--smoothness which is not available at the inter-element level of finite volume and finite element data. Adaptive error control techniques are often used to ameliorate the challenge posed by inter-element discontinuities. As the root of the difficulties is the discontinuous nature of the data, we present a complementary approach of applying smoothness-enhancing accuracy-conserving filters to the data prior to streamline integration. We investigate whether such an approach applied to uniform quadrilateral discontinuous Galerkin (high-order finite volume) data can be used to augment current adaptive error control approaches. We discuss and demonstrate through numerical example the computational trade-offs exhibited when one applies such a strategy.
A Modeling Approach for Burn Scar Assessment Using Natural Features and Elastic Property
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsap, L V; Zhang, Y; Goldgof, D B
2004-04-02
A modeling approach is presented for quantitative burn scar assessment. Emphases are given to: (1) constructing a finite element model from natural image features with an adaptive mesh, and (2) quantifying the Young's modulus of scars using the finite element model and the regularization method. A set of natural point features is extracted from the images of burn patients. A Delaunay triangle mesh is then generated that adapts to the point features. A 3D finite element model is built on top of the mesh with the aid of range images providing the depth information. The Young's modulus of scars ismore » quantified with a simplified regularization functional, assuming that the knowledge of scar's geometry is available. The consistency between the Relative Elasticity Index and the physician's rating based on the Vancouver Scale (a relative scale used to rate burn scars) indicates that the proposed modeling approach has high potentials for image-based quantitative burn scar assessment.« less
Newmark local time stepping on high-performance computing architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch; Institute of Geophysics, ETH Zurich; Grote, Marcus, E-mail: marcus.grote@unibas.ch
In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strongmore » element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.« less
Development of an adaptive hp-version finite element method for computational optimal control
NASA Technical Reports Server (NTRS)
Hodges, Dewey H.; Warner, Michael S.
1994-01-01
In this research effort, the usefulness of hp-version finite elements and adaptive solution-refinement techniques in generating numerical solutions to optimal control problems has been investigated. Under NAG-939, a general FORTRAN code was developed which approximated solutions to optimal control problems with control constraints and state constraints. Within that methodology, to get high-order accuracy in solutions, the finite element mesh would have to be refined repeatedly through bisection of the entire mesh in a given phase. In the current research effort, the order of the shape functions in each element has been made a variable, giving more flexibility in error reduction and smoothing. Similarly, individual elements can each be subdivided into many pieces, depending on the local error indicator, while other parts of the mesh remain coarsely discretized. The problem remains to reduce and smooth the error while still keeping computational effort reasonable enough to calculate time histories in a short enough time for on-board applications.
NASA Astrophysics Data System (ADS)
Delandmeter, Philippe; Lambrechts, Jonathan; Vallaeys, Valentin; Naithani, Jaya; Remacle, Jean-François; Legat, Vincent; Deleersnijder, Eric
2017-04-01
Vertical discretisation is crucial in the modelling of lake thermocline oscillations. For finite element methods, a simple way to increase the resolution close to the oscillating thermocline is to use vertical adaptive coordinates. With an Arbitrary Lagrangian-Eulerian (ALE) formulation, the mesh can be adapted to increase the resolution in regions with strong shear or stratification. In such an application, consistency and conservativity must be strictly enforced. SLIM 3D, a discontinuous-Galerkin finite element model for shallow-water flows (www.climate.be/slim, e.g. Kärnä et al., 2013, Delandmeter et al., 2015), was designed to be strictly consistent and conservative in its discrete formulation. In this context, special care must be paid to the coupling of the external and internal modes of the model and the moving mesh algorithm. In this framework, the mesh can be adapted arbitrarily in the vertical direction. Two moving mesh algorithms were implemented: the first one computes an a-priori optimal mesh; the second one diffuses vertically the mesh (Burchard et al., 2004, Hofmeister et al., 2010). The criteria used to define the optimal mesh and the diffusion function are related to a suitable measure of shear and stratification. We will present in detail the design of the model and how the consistency and conservativity is obtained. Then we will apply it to both idealised benchmarks and the wind-forced thermocline oscillations in Lake Tanganyika (Naithani et al. 2002). References Tuomas Kärnä, Vincent Legat and Eric Deleersnijder. A baroclinic discontinuous Galerkin finite element model for coastal flows, Ocean Modelling, 61:1-20, 2013. Philippe Delandmeter, Stephen E Lewis, Jonathan Lambrechts, Eric Deleersnijder, Vincent Legat and Eric Wolanski. The transport and fate of riverine fine sediment exported to a semi-open system. Estuarine, Coastal and Shelf Science, 167:336-346, 2015. Hans Burchard and Jean-Marie Beckers. Non-uniform adaptive vertical grids in one-dimensional numerical ocean models. Ocean Modelling, 6:51-81, 2004. Richard Hofmeister, Hans Burchard and Jean-Marie Beckers. Non-uniform adaptive vertical grids for 3d numerical ocean models. Ocean Modelling, 33:70-86, 2010. Jaya Naithani, Eric Deleersnijder and Pierre-Denis Plisnier. Origin of intraseasonal variability in Lake Tanganyika. Geophysical Research Letters, 29(23), doi:10.1029/2002GL015843, 2002.
NASA Astrophysics Data System (ADS)
Wu, Zhangming; Li, Hao
2017-11-01
This paper proposes a novel adaptive sun tracker which is constructed by hybrid unsymmetric composite laminates. The adaptive sun tracker could be applied on spacecraft solar panels to increase their energy efficiency through decreasing the inclined angle between the sunlight and the solar panel normal. The sun tracker possesses a large rotation freedom and its rotation angle depends on the laminate temperature, which is affected by the light condition in the orbit. Both analytical model and finite element model (FEM) are developed for the sun tracker to predict its rotation angle in different light conditions. In this work, the light condition of the geosynchronous orbit on winter solstice is considered in the numerical prediction of the temperatures of the hybrid laminates. The final inclined angle between the sunlight and the solar panel normal during a solar day is computed using the finite element model. Parametric study of the adaptive sun tracker is conducted to improve its capacity and effectiveness of sun tracking. The improved adaptive sun tracker is lightweight and has a state-of-the-art design. In addition, the adaptive sun tracker does not consume any power of the solar panel, since it has no electrical driving devices. The proposed adaptive sun tracker provides a potential alternative to replace the traditional sophisticated electrical driving mechanisms for spacecraft solar panels.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers
NASA Technical Reports Server (NTRS)
Guruswamy, Guru; VanDalsem, William (Technical Monitor)
1994-01-01
Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
de Almeida, Valmor F.
In this work, a phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equationmore » and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.« less
ATHENA 3D: A finite element code for ultrasonic wave propagation
NASA Astrophysics Data System (ADS)
Rose, C.; Rupin, F.; Fouquet, T.; Chassignole, B.
2014-04-01
The understanding of wave propagation phenomena requires use of robust numerical models. 3D finite element (FE) models are generally prohibitively time consuming. However, advances in computing processor speed and memory allow them to be more and more competitive. In this context, EDF R&D developed the 3D version of the well-validated FE code ATHENA2D. The code is dedicated to the simulation of wave propagation in all kinds of elastic media and in particular, heterogeneous and anisotropic materials like welds. It is based on solving elastodynamic equations in the calculation zone expressed in terms of stress and particle velocities. The particularity of the code relies on the fact that the discretization of the calculation domain uses a Cartesian regular 3D mesh while the defect of complex geometry can be described using a separate (2D) mesh using the fictitious domains method. This allows combining the rapidity of regular meshes computation with the capability of modelling arbitrary shaped defects. Furthermore, the calculation domain is discretized with a quasi-explicit time evolution scheme. Thereby only local linear systems of small size have to be solved. The final step to reduce the computation time relies on the fact that ATHENA3D has been parallelized and adapted to the use of HPC resources. In this paper, the validation of the 3D FE model is discussed. A cross-validation of ATHENA 3D and CIVA is proposed for several inspection configurations. The performances in terms of calculation time are also presented in the cases of both local computer and computation cluster use.
ELEFANT: a user-friendly multipurpose geodynamics code
NASA Astrophysics Data System (ADS)
Thieulot, C.
2014-07-01
A new finite element code for the solution of the Stokes and heat transport equations is presented. It has purposely been designed to address geological flow problems in two and three dimensions at crustal and lithospheric scales. The code relies on the Marker-in-Cell technique and Lagrangian markers are used to track materials in the simulation domain which allows recording of the integrated history of deformation; their (number) density is variable and dynamically adapted. A variety of rheologies has been implemented including nonlinear thermally activated dislocation and diffusion creep and brittle (or plastic) frictional models. The code is built on the Arbitrary Lagrangian Eulerian kinematic description: the computational grid deforms vertically and allows for a true free surface while the computational domain remains of constant width in the horizontal direction. The solution to the large system of algebraic equations resulting from the finite element discretisation and linearisation of the set of coupled partial differential equations to be solved is obtained by means of the efficient parallel direct solver MUMPS whose performance is thoroughly tested, or by means of the WISMP and AGMG iterative solvers. The code accuracy is assessed by means of many geodynamically relevant benchmark experiments which highlight specific features or algorithms, e.g., the implementation of the free surface stabilisation algorithm, the (visco-)plastic rheology implementation, the temperature advection, the capacity of the code to handle large viscosity contrasts. A two-dimensional application to salt tectonics presented as case study illustrates the potential of the code to model large scale high resolution thermo-mechanically coupled free surface flows.
de Almeida, Valmor F.
2017-04-19
In this work, a phase-space discontinuous Galerkin (PSDG) method is presented for the solution of stellar radiative transfer problems. It allows for greater adaptivity than competing methods without sacrificing generality. The method is extensively tested on a spherically symmetric, static, inverse-power-law scattering atmosphere. Results for different sizes of atmospheres and intensities of scattering agreed with asymptotic values. The exponentially decaying behavior of the radiative field in the diffusive-transparent transition region, and the forward peaking behavior at the surface of extended atmospheres were accurately captured. The integrodifferential equation of radiation transfer is solved iteratively by alternating between the radiative pressure equationmore » and the original equation with the integral term treated as an energy density source term. In each iteration, the equations are solved via an explicit, flux-conserving, discontinuous Galerkin method. Finite elements are ordered in wave fronts perpendicular to the characteristic curves so that elemental linear algebraic systems are solved quickly by sweeping the phase space element by element. Two implementations of a diffusive boundary condition at the origin are demonstrated wherein the finite discontinuity in the radiation intensity is accurately captured by the proposed method. This allows for a consistent mechanism to preserve photon luminosity. The method was proved to be robust and fast, and a case is made for the adequacy of parallel processing. In addition to classical two-dimensional plots, results of normalized radiation intensity were mapped onto a log-polar surface exhibiting all distinguishing features of the problem studied.« less
Quark structure of static correlators in high temperature QCD
NASA Astrophysics Data System (ADS)
Bernard, Claude; DeGrand, Thomas A.; DeTar, Carleton; Gottlieb, Steven; Krasnitz, A.; Ogilvie, Michael C.; Sugar, R. L.; Toussaint, D.
1992-07-01
We present results of numerical simulations of quantum chromodynamics at finite temperature with two flavors of Kogut-Susskind quarks on the Intel iPSC/860 parallel processor. We investigate the properties of the objects whose exchange gives static screening lengths by reconstructing their correlated quark-antiquark structure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Couch, R.; Ziegler, D. P.
This project was a muki-partner CRADA. This was a partnership between Alcoa and LLNL. AIcoa developed a system of numerical simulation modules that provided accurate and efficient threedimensional modeling of combined fluid dynamics and structural response.
Partition of unity finite element method for quantum mechanical materials calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pask, J. E.; Sukumar, N.
The current state of the art for large-scale quantum-mechanical simulations is the planewave (PW) pseudopotential method, as implemented in codes such as VASP, ABINIT, and many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points in space, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires significant nonlocal communications, which limit parallel efficiency. Real-space methods such as finite-differences (FD) and finite-elements (FE) have partially addressed both resolution and parallel-communications issues but have been plagued by one key disadvantage relative tomore » PW: excessive number of degrees of freedom (basis functions) needed to achieve the required accuracies. In this paper, we present a real-space partition of unity finite element (PUFE) method to solve the Kohn–Sham equations of density functional theory. In the PUFE method, we build the known atomic physics into the solution process using partition-of-unity enrichment techniques in finite element analysis. The method developed herein is completely general, applicable to metals and insulators alike, and particularly efficient for deep, localized potentials, as occur in calculations at extreme conditions of pressure and temperature. Full self-consistent Kohn–Sham calculations are presented for LiH, involving light atoms, and CeAl, involving heavy atoms with large numbers of atomic-orbital enrichments. We find that the new PUFE approach attains the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the PW method. As a result, we compute the equation of state of LiH and show that the computed lattice constant and bulk modulus are in excellent agreement with reference PW results, while requiring an order of magnitude fewer degrees of freedom to obtain.« less
Partition of unity finite element method for quantum mechanical materials calculations
Pask, J. E.; Sukumar, N.
2016-11-09
The current state of the art for large-scale quantum-mechanical simulations is the planewave (PW) pseudopotential method, as implemented in codes such as VASP, ABINIT, and many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points in space, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires significant nonlocal communications, which limit parallel efficiency. Real-space methods such as finite-differences (FD) and finite-elements (FE) have partially addressed both resolution and parallel-communications issues but have been plagued by one key disadvantage relative tomore » PW: excessive number of degrees of freedom (basis functions) needed to achieve the required accuracies. In this paper, we present a real-space partition of unity finite element (PUFE) method to solve the Kohn–Sham equations of density functional theory. In the PUFE method, we build the known atomic physics into the solution process using partition-of-unity enrichment techniques in finite element analysis. The method developed herein is completely general, applicable to metals and insulators alike, and particularly efficient for deep, localized potentials, as occur in calculations at extreme conditions of pressure and temperature. Full self-consistent Kohn–Sham calculations are presented for LiH, involving light atoms, and CeAl, involving heavy atoms with large numbers of atomic-orbital enrichments. We find that the new PUFE approach attains the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the PW method. As a result, we compute the equation of state of LiH and show that the computed lattice constant and bulk modulus are in excellent agreement with reference PW results, while requiring an order of magnitude fewer degrees of freedom to obtain.« less
NASA Astrophysics Data System (ADS)
Bause, Markus
2008-02-01
In this work we study mixed finite element approximations of Richards' equation for simulating variably saturated subsurface flow and simultaneous reactive solute transport. Whereas higher order schemes have proved their ability to approximate reliably reactive solute transport (cf., e.g. [Bause M, Knabner P. Numerical simulation of contaminant biodegradation by higher order methods and adaptive time stepping. Comput Visual Sci 7;2004:61-78]), the Raviart- Thomas mixed finite element method ( RT0) with a first order accurate flux approximation is popular for computing the underlying water flow field (cf. [Bause M, Knabner P. Computation of variably saturated subsurface flow by adaptive mixed hybrid finite element methods. Adv Water Resour 27;2004:565-581, Farthing MW, Kees CE, Miller CT. Mixed finite element methods and higher order temporal approximations for variably saturated groundwater flow. Adv Water Resour 26;2003:373-394, Starke G. Least-squares mixed finite element solution of variably saturated subsurface flow problems. SIAM J Sci Comput 21;2000:1869-1885, Younes A, Mosé R, Ackerer P, Chavent G. A new formulation of the mixed finite element method for solving elliptic and parabolic PDE with triangular elements. J Comp Phys 149;1999:148-167, Woodward CS, Dawson CN. Analysis of expanded mixed finite element methods for a nonlinear parabolic equation modeling flow into variably saturated porous media. SIAM J Numer Anal 37;2000:701-724]). This combination might be non-optimal. Higher order techniques could increase the accuracy of the flow field calculation and thereby improve the prediction of the solute transport. Here, we analyse the application of the Brezzi- Douglas- Marini element ( BDM1) with a second order accurate flux approximation to elliptic, parabolic and degenerate problems whose solutions lack the regularity that is assumed in optimal order error analyses. For the flow field calculation a superiority of the BDM1 approach to the RT0 one is observed, which however is less significant for the accompanying solute transport.
Modeling and control of flexible structures
NASA Technical Reports Server (NTRS)
Gibson, J. S.; Mingori, D. L.
1988-01-01
This monograph presents integrated modeling and controller design methods for flexible structures. The controllers, or compensators, developed are optimal in the linear-quadratic-Gaussian sense. The performance objectives, sensor and actuator locations and external disturbances influence both the construction of the model and the design of the finite dimensional compensator. The modeling and controller design procedures are carried out in parallel to ensure compatibility of these two aspects of the design problem. Model reduction techniques are introduced to keep both the model order and the controller order as small as possible. A linear distributed, or infinite dimensional, model is the theoretical basis for most of the text, but finite dimensional models arising from both lumped-mass and finite element approximations also play an important role. A central purpose of the approach here is to approximate an optimal infinite dimensional controller with an implementable finite dimensional compensator. Both convergence theory and numerical approximation methods are given. Simple examples are used to illustrate the theory.
Experimental and Numerical Study on the Tensile Behaviour of UACS/Al Fibre Metal Laminate
NASA Astrophysics Data System (ADS)
Xue, Jia; Wang, Wen-Xue; Zhang, Jia-Zhen; Wu, Su-Jun; Li, Hang
2015-10-01
A new fibre metal laminate fabricated with aluminium sheets and unidirectionally arrayed chopped strand (UACS) plies is proposed. The UACS ply is made by cutting parallel slits into a unidirectional carbon fibre prepreg. The UACS/Al laminate may be viewed as aluminium laminate reinforced by highly aligned, discontinuous carbon fibres. The tensile behaviour of UACS/Al laminate, including thermal residual stress and failure progression, is investigated through experiments and numerical simulation. Finite element analysis was used to simulate the onset and propagation of intra-laminar fractures occurring within slits of the UACS plies and delamination along the interfaces. The finite element models feature intra-laminar cohesive elements inserted into the slits and inter-laminar cohesive elements inserted at the interfaces. Good agreement are obtained between experimental results and finite element analysis, and certain limitations of the finite element models are observed and discussed. The combined experimental and numerical studies provide a detailed understanding of the tensile behaviour of UACS/Al laminates.
NASA Astrophysics Data System (ADS)
Alfonso, Lester; Zamora, Jose; Cruz, Pedro
2015-04-01
The stochastic approach to coagulation considers the coalescence process going in a system of a finite number of particles enclosed in a finite volume. Within this approach, the full description of the system can be obtained from the solution of the multivariate master equation, which models the evolution of the probability distribution of the state vector for the number of particles of a given mass. Unfortunately, due to its complexity, only limited results were obtained for certain type of kernels and monodisperse initial conditions. In this work, a novel numerical algorithm for the solution of the multivariate master equation for stochastic coalescence that works for any type of kernels and initial conditions is introduced. The performance of the method was checked by comparing the numerically calculated particle mass spectrum with analytical solutions obtained for the constant and sum kernels, with an excellent correspondence between the analytical and numerical solutions. In order to increase the speedup of the algorithm, software parallelization techniques with OpenMP standard were used, along with an implementation in order to take advantage of new accelerator technologies. Simulations results show an important speedup of the parallelized algorithms. This study was funded by a grant from Consejo Nacional de Ciencia y Tecnologia de Mexico SEP-CONACYT CB-131879. The authors also thanks LUFAC® Computacion SA de CV for CPU time and all the support provided.
ERIC Educational Resources Information Center
Nazari, Mohammad Ali; Perrier, Pascal; Payan, Yohan
2013-01-01
Purpose: The authors aimed to design a distributed lambda model (DLM), which is well adapted to implement three-dimensional (3-D), finite-element descriptions of muscles. Method: A muscle element model was designed. Its stress-strain relationships included the active force-length characteristics of the ? model along the muscle fibers, together…
Sensitivity Analysis for Multidisciplinary Systems (SAMS)
2016-12-01
support both mode-based structural representations and time-dependent, nonlinear finite element structural dynamics. This interim report describes...Adaptation, & Sensitivity Toolkit • Elasticity, heat transfer, & compressible flow • Adjoint solver for sensitivity analysis • High-order finite elements ...PROGRAM ELEMENT NUMBER 62201F 6. AUTHOR(S) Richard D. Snyder 5d. PROJECT NUMBER 2401 5e. TASK NUMBER N/A 5f. WORK UNIT NUMBER Q1FS 7
Automated three-component synthesis of a library of γ-lactams
Fenster, Erik; Hill, David; Reiser, Oliver
2012-01-01
Summary A three-component method for the synthesis of γ-lactams from commercially available maleimides, aldehydes, and amines was adapted to parallel library synthesis. Improvements to the chemistry over previous efforts include the optimization of the method to a one-pot process, the management of by-products and excess reagents, the development of an automated parallel sequence, and the adaption of the method to permit the preparation of enantiomerically enriched products. These efforts culminated in the preparation of a library of 169 γ-lactams. PMID:23209515