parallel stencil operations: Topics by Science.gov

Sample records for parallel stencil operations

Optimizing transformations of stencil operations for parallel cache-based architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bassetti, F.; Davis, K.

This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less
Stencils and problem partitionings: Their influence on the performance of multiple processor systems

NASA Technical Reports Server (NTRS)

Reed, D. A.; Adams, L. M.; Patrick, M. L.

1986-01-01

Given a discretization stencil, partitioning the problem domain is an important first step for the efficient solution of partial differential equations on multiple processor systems. Partitions are derived that minimize interprocessor communication when the number of processors is known a priori and each domain partition is assigned to a different processor. This partitioning technique uses the stencil structure to select appropriate partition shapes. For square problem domains, it is shown that non-standard partitions (e.g., hexagons) are frequently preferable to the standard square partitions for a variety of commonly used stencils. This investigation is concluded with a formalization of the relationship between partition shape, stencil structure, and architecture, allowing selection of optimal partitions for a variety of parallel systems.
Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

NASA Astrophysics Data System (ADS)

Kim, Jae Wook

2013-05-01

This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Parallel fabrication of sub-50-nm uniformly sized nanoparticles by deposition through a patterned silicon nitride nanostencil.

PubMed

Yan, X-M; Contreras, A M; Koebel, M M; Liddle, J A; Somorjai, G A

2005-06-01

Using low-pressure chemical vapor deposition of silicon dioxide, we have reduced the size of 56-nm features in a silicon nitride membrane, called a stencil, down to 36 nm. Sub-50-nm uniformly sized nanoparticles are fabricated by electron-beam deposition of Pt through the stencil mask. A self-assembled monolayer (SAM) of tridecafluoro-1,1,2,2-tetrahydrooctyl-1-trichlorosilane was used to reduce Pt clogging of the nanosize holes during deposition as well as to protect the stencil during the postdeposition Pt removal. X-ray photoelectron spectroscopy shows that the SAM protects the stencil efficiently during this postdeposition removal of Pt.
Ion projection lithography: November 2000 status and sub-70-nm prospects

NASA Astrophysics Data System (ADS)

Kaesmaier, Rainer; Wolter, Andreas; Loeschner, Hans; Schunck, Stefan

2000-10-01

Among all next generation lithography (NGL) options Ion Projection Lithography (IPL) offers the smallest (particle) wavelength of 5x10- 5nm (l00keV Helium ions). Thus, 4x reduction ion-optics has diffraction limits <3nm even when using a numerical aperture as low as NAequals10-5. As part of the European MEDEA IPL project headed by Infineon Technologies wide field ion-optics have been designed by IMS- Vienna with predicted resolution of 50nm within a 12.5mm exposure field. The ion-optics part of the PDT tool (PDT-IOS) has been realized and assembled. In parallel to the PDT-IOS effort, at Leica Jena a test bench for a vertical vacuum 300mm-wafer stage has been realized. Operation of magnetic bearing supported stage movement has already been demonstrated. As ASML vacuum compatible optical wafer alignment system, with 3nm(3(sigma) ) precision demonstrated in air, has been integrated to this wafer test bench system recently. Parallel to the IPL tool development, Infineon Technologies Mask House and the Institute for Microelectronics Stuttgart are intensively working on the development of IPL stencil masks with success in producing 150mm and 200mm stencil masks as reported elsewhere. This paper is focused on information about the status of the PDT-IOS tool.
Operator induced multigrid algorithms using semirefinement

NASA Technical Reports Server (NTRS)

Decker, Naomi; Vanrosendale, John

1989-01-01

A variant of multigrid, based on zebra relaxation, and a new family of restriction/prolongation operators is described. Using zebra relaxation in combination with an operator-induced prolongation leads to fast convergence, since the coarse grid can correct all error components. The resulting algorithms are not only fast, but are also robust, in the sense that the convergence rate is insensitive to the mesh aspect ratio. This is true even though line relaxation is performed in only one direction. Multigrid becomes a direct method if an operator-induced prolongation is used, together with the induced coarse grid operators. Unfortunately, this approach leads to stencils which double in size on each coarser grid. The use of an implicit three point restriction can be used to factor these large stencils, in order to retain the usual five or nine point stencils, while still achieving fast convergence. This algorithm achieves a V-cycle convergence rate of 0.03 on Poisson's equation, using 1.5 zebra sweeps per level, while the convergence rate improves to 0.003 if optimal nine point stencils are used. Numerical results for two and three dimensional model problems are presented, together with a two level analysis explaining these results.
Stencil computations for PDE-based applications with examples from DUNE and hypre

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engwer, C.; Falgout, R. D.; Yang, U. M.

Here, stencils are commonly used to implement efficient on–the–fly computations of linear operators arising from partial differential equations. At the same time the term “stencil” is not fully defined and can be interpreted differently depending on the application domain and the background of the software developers. Common features in stencil codes are the preservation of the structure given by the discretization of the partial differential equation and the benefit of minimal data storage. We discuss stencil concepts of different complexity, show how they are used in modern software packages like hypre and DUNE, and discuss recent efforts to extend themore » software to enable stencil computations of more complex problems and methods such as inf–sup–stable Stokes discretizations and mixed finite element discretizations.« less
Stencil computations for PDE-based applications with examples from DUNE and hypre

DOE PAGES

Engwer, C.; Falgout, R. D.; Yang, U. M.

2017-02-24

Here, stencils are commonly used to implement efficient on–the–fly computations of linear operators arising from partial differential equations. At the same time the term “stencil” is not fully defined and can be interpreted differently depending on the application domain and the background of the software developers. Common features in stencil codes are the preservation of the structure given by the discretization of the partial differential equation and the benefit of minimal data storage. We discuss stencil concepts of different complexity, show how they are used in modern software packages like hypre and DUNE, and discuss recent efforts to extend themore » software to enable stencil computations of more complex problems and methods such as inf–sup–stable Stokes discretizations and mixed finite element discretizations.« less
Locality Aware Concurrent Start for Stencil Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
Snowflake: A Lightweight Portable Stencil DSL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Nathan; Driscoll, Michael; Markley, Charles

Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less
Snowflake: A Lightweight Portable Stencil DSL

DOE PAGES

Zhang, Nathan; Driscoll, Michael; Markley, Charles; ...

2017-05-01

Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less
Efficient Cache use for Stencil Operations on Structured Discretization Grids

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob F.

2001-01-01

We derive tight bounds on the cache misses for evaluation of explicit stencil operators on structured grids. Our lower bound is based on the isoperimetrical property of the discrete octahedron. Our upper bound is based on a good surface to volume ratio of a parallelepiped spanned by a reduced basis of the interference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by a factor of three, relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattice have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on a MIPS R10000 processor we show a good correlation between abnormally high numbers of cache misses and unfavorable three-dimensional grids.
Reaction of photochemical resists used in screen printing under the influence of digitally modulated ultra violet light

NASA Astrophysics Data System (ADS)

Gmuender, T.

2017-02-01

Different chemical photo-reactive emulsions are used in screen printing for stencil production. Depending on the bandwidth, optical power and depth of field from the optical system, the reaction / exposure speed has a diverse value. In this paper, the emulsions get categorized and validated in a first step. After that a mathematical model gets developed and adapted due to heuristic experience to estimate the exposure speed under the influence of digitally modulated ultra violet (UV) light. The main intention is to use the technical specifications (intended wavelength, exposure time, distance to the stencil, electrical power, stencil configuration) in the emulsion data sheet primary written down with an uncertainty factor for the end user operating with large projector arc lamps and photo films. These five parameters are the inputs for a mathematical formula which gives as an output the exposure speed for the Computer to Screen (CTS) machine calculated for each emulsion / stencil setup. The importance of this work relies in the possibility to rate with just a few boundaries the performance and capacity of an exposure system used in screen printing instead of processing a long test series for each emulsion / stencil configuration.
Modeling and Simulating Multiple Failure Masking enabled by Local Recovery for Stencil-based Applications at Extreme Scales

DOE PAGES

Gamell, Marc; Teranishi, Keita; Mayo, Jackson; ...

2017-04-24

By obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Some previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this papermore » we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. These results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.« less
Modeling and Simulating Multiple Failure Masking enabled by Local Recovery for Stencil-based Applications at Extreme Scales

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamell, Marc; Teranishi, Keita; Mayo, Jackson

By obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Some previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this papermore » we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. These results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.« less
Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less
A Parallel Fast Sweeping Method for the Eikonal Equation

NASA Astrophysics Data System (ADS)

Baker, B.

2017-12-01

Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.
75 FR 69157 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-10

... educational railroad operating between Virginia City and Carson City via Gold Hill, Nevada. The railroad was... used for photographic subjects in an educational setting to depict the type of freight trains that would have operated in the era during mining operations. Therefore, stenciling the required information...
49 CFR 215.303 - Stenciling of restricted cars.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 49 Transportation 4 2012-10-01 2012-10-01 false Stenciling of restricted cars. 215.303 Section 215... ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.303 Stenciling of restricted cars. (a) Each restricted railroad freight car that is described in § 215.205(a) of...

49 CFR 215.303 - Stenciling of restricted cars.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 49 Transportation 4 2011-10-01 2011-10-01 false Stenciling of restricted cars. 215.303 Section 215... ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.303 Stenciling of restricted cars. (a) Each restricted railroad freight car that is described in § 215.205(a) of...
49 CFR 215.303 - Stenciling of restricted cars.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 49 Transportation 4 2010-10-01 2010-10-01 false Stenciling of restricted cars. 215.303 Section 215... ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.303 Stenciling of restricted cars. (a) Each restricted railroad freight car that is described in § 215.205(a) of...
49 CFR 215.303 - Stenciling of restricted cars.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 49 Transportation 4 2014-10-01 2014-10-01 false Stenciling of restricted cars. 215.303 Section 215... ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.303 Stenciling of restricted cars. (a) Each restricted railroad freight car that is described in § 215.205(a) of...
49 CFR 215.303 - Stenciling of restricted cars.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 49 Transportation 4 2013-10-01 2013-10-01 false Stenciling of restricted cars. 215.303 Section 215... ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.303 Stenciling of restricted cars. (a) Each restricted railroad freight car that is described in § 215.205(a) of...
A staggered-grid finite-difference scheme optimized in the time–space domain for modeling scalar-wave propagation in geophysical problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tan, Sirui, E-mail: siruitan@hotmail.com; Huang, Lianjie, E-mail: ljh@lanl.gov

For modeling scalar-wave propagation in geophysical problems using finite-difference schemes, optimizing the coefficients of the finite-difference operators can reduce numerical dispersion. Most optimized finite-difference schemes for modeling seismic-wave propagation suppress only spatial but not temporal dispersion errors. We develop a novel optimized finite-difference scheme for numerical scalar-wave modeling to control dispersion errors not only in space but also in time. Our optimized scheme is based on a new stencil that contains a few more grid points than the standard stencil. We design an objective function for minimizing relative errors of phase velocities of waves propagating in all directions within amore » given range of wavenumbers. Dispersion analysis and numerical examples demonstrate that our optimized finite-difference scheme is computationally up to 2.5 times faster than the optimized schemes using the standard stencil to achieve the similar modeling accuracy for a given 2D or 3D problem. Compared with the high-order finite-difference scheme using the same new stencil, our optimized scheme reduces 50 percent of the computational cost to achieve the similar modeling accuracy. This new optimized finite-difference scheme is particularly useful for large-scale 3D scalar-wave modeling and inversion.« less
Methods for compressible fluid simulation on GPUs using high-order finite differences

NASA Astrophysics Data System (ADS)

Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer

2017-08-01

We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
The application of electrolytic photoetching and photopolishing to AISI 304 stainless steel and the electrolytic photoetching of amorphous cobalt alloy

NASA Astrophysics Data System (ADS)

Thomaz, Marita Duarte Canhao da Silva Pereira Fernandes

The results presented cover broad aspects of a quantitative investigation into the elecrolytic etching and polishing of metals and alloys through photographically produced dielectric stencils (Photoresists). A study of the potential field generated between a cathode and relatively smaller anode sites as those defined by a dielectric stencil was carried out. Numerical, analytical and graphical methods yielded answers to the factors determining lateral dissolution (undercut) at the anode/stencil interface. A quasi steady state numerical model simulating the transient behavior of the partially masked electrodes undergoing dissolution was obtained. AISI 304 stainless steel was electrolytically photoetched in 10% w/w HCl electrolyte. The optimised process parameters were utilised for quantifying the effects of galvanostatic etching of the anode as that defined by a relatively narrow adherent resist stencil. Stainless steel was also utilised in investigating electrolytic photopolishing. A polishing electrolyte (orthophosphoric acid-glycerol) was modified by the addition of a surfactant which yielded surface texture values of 70nm (Ra) and high levels of specular reflectance. These results were used in the production of features upon the metal surface through photographically produced precision stencils. The process was applied to the production of edge filters requiring high quality surface textures in precision recesses. Some of the new amorphous material exhibited high resistance to dissolution in commercially used spray etching formulations. One of these materials is a cobalt based alloy produced by chill block spinning. This material was also investigated and electro etched in 10% w/w HCl solution. Although passivity was not overcome, by selecting suitable operating parameters the successful electro photoetching of precision magnetic recording head laminations was achieved. Similarly, a polycrystalline nickel based alloy also exhibiting passivity in commercially used etchants was successfully etched in the above electrolyte.
Micropatterning of neural stem cells and Purkinje neurons using a polydimethylsiloxane (PDMS) stencil.

PubMed

Choi, Jin Ho; Lee, Hyun; Jin, Hee Kyung; Bae, Jae-sung; Kim, Gyu Man

2012-12-07

A new fabrication method of a polydimethylsiloxane (PDMS) stencil embedded microwell plate is proposed and applied to a localized culture of Purkinje neurons (PNs) and neural stem cells (NSCs). A microwell plate combines a PDMS stencil and well plate. The PDMS stencil was fabricated by spin casting from an SU-8 master mold. Gas blowing using nitrogen was adopted to perforate the stencil membrane. An acrylic well plate compartment mold was fabricated using computer numerical control (CNC) machining. By PDMS casting using a stencil placed on an acrylic mold, microwell plates were fabricated without punching or the use of a plasma bonding process. By using the stencil as a physical mask for the cell culture, PNs and NSCs were successfully cultured into micropatterns. The microwell plate could be applied to the localizing and culturing of a cell. The micropatterned NSCs were differentiated into neurons, astrocytes, and oligodendrocytes. The results showed that cells could be cultured and differentiated into micropatterns in a precisely controlled manner in any shape and in specific sizes for bioscience study and bioengineering applications.
A Review of High-Order and Optimized Finite-Difference Methods for Simulating Linear Wave Phenomena

NASA Technical Reports Server (NTRS)

Zingg, David W.

1996-01-01

This paper presents a review of high-order and optimized finite-difference methods for numerically simulating the propagation and scattering of linear waves, such as electromagnetic, acoustic, or elastic waves. The spatial operators reviewed include compact schemes, non-compact schemes, schemes on staggered grids, and schemes which are optimized to produce specific characteristics. The time-marching methods discussed include Runge-Kutta methods, Adams-Bashforth methods, and the leapfrog method. In addition, the following fourth-order fully-discrete finite-difference methods are considered: a one-step implicit scheme with a three-point spatial stencil, a one-step explicit scheme with a five-point spatial stencil, and a two-step explicit scheme with a five-point spatial stencil. For each method studied, the number of grid points per wavelength required for accurate simulation of wave propagation over large distances is presented. Recommendations are made with respect to the suitability of the methods for specific problems and practical aspects of their use, such as appropriate Courant numbers and grid densities. Avenues for future research are suggested.
A Discontinuous Galerkin Finite Element Method for Hamilton-Jacobi Equations

NASA Technical Reports Server (NTRS)

Hu, Changqing; Shu, Chi-Wang

1998-01-01

In this paper, we present a discontinuous Galerkin finite element method for solving the nonlinear Hamilton-Jacobi equations. This method is based on the Runge-Kutta discontinuous Galerkin finite element method for solving conservation laws. The method has the flexibility of treating complicated geometry by using arbitrary triangulation, can achieve high order accuracy with a local, compact stencil, and are suited for efficient parallel implementation. One and two dimensional numerical examples are given to illustrate the capability of the method.
49 CFR 232.15 - Movement of defective equipment.

Code of Federal Regulations, 2010 CFR

2010-10-01

... the safe repair of the car. (d) Computation of percent operative power brakes. (1) The percentage of operative power brakes in a train shall be based on the number of control valves in the train. The... contained on the stencil, sticker, or badge plate required by § 232.103(g) for considering the power brakes...
High-order asynchrony-tolerant finite difference schemes for partial differential equations

NASA Astrophysics Data System (ADS)

Aditya, Konduri; Donzis, Diego A.

2017-12-01

Synchronizations of processing elements (PEs) in massively parallel simulations, which arise due to communication or load imbalances between PEs, significantly affect the scalability of scientific applications. We have recently proposed a method based on finite-difference schemes to solve partial differential equations in an asynchronous fashion - synchronization between PEs is relaxed at a mathematical level. While standard schemes can maintain their stability in the presence of asynchrony, their accuracy is drastically affected. In this work, we present a general methodology to derive asynchrony-tolerant (AT) finite difference schemes of arbitrary order of accuracy, which can maintain their accuracy when synchronizations are relaxed. We show that there are several choices available in selecting a stencil to derive these schemes and discuss their effect on numerical and computational performance. We provide a simple classification of schemes based on the stencil and derive schemes that are representative of different classes. Their numerical error is rigorously analyzed within a statistical framework to obtain the overall accuracy of the solution. Results from numerical experiments are used to validate the performance of the schemes.
A New Approach for Constructing Highly Stable High Order CESE Schemes

NASA Technical Reports Server (NTRS)

Chang, Sin-Chung

2010-01-01

A new approach is devised to construct high order CESE schemes which would avoid the common shortcomings of traditional high order schemes including: (a) susceptibility to computational instabilities; (b) computational inefficiency due to their local implicit nature (i.e., at each mesh points, need to solve a system of linear/nonlinear equations involving all the mesh variables associated with this mesh point); (c) use of large and elaborate stencils which complicates boundary treatments and also makes efficient parallel computing much harder; (d) difficulties in applications involving complex geometries; and (e) use of problem-specific techniques which are needed to overcome stability problems but often cause undesirable side effects. In fact it will be shown that, with the aid of a conceptual leap, one can build from a given 2nd-order CESE scheme its 4th-, 6th-, 8th-,... order versions which have the same stencil and same stability conditions of the 2nd-order scheme, and also retain all other advantages of the latter scheme. A sketch of multidimensional extensions will also be provided.
Printability Optimization For Fine Pitch Solder Bonding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwon, Sang-Hyun; Lee, Chang-Woo; Yoo, Sehoon

2011-01-17

Effect of metal mask and pad design on solder printability was evaluated by DOE in this study. The process parameters were stencil thickness, squeegee angle, squeegee speed, mask separating speed, and pad angle of PCB. The main process parameters for printability were stencil thickness and squeegee angle. The response surface showed that maximum printability of 1005 chip was achieved at the stencil thickness of 0.12 mm while the maximum printability of 0603 and 0402 chip was obtained at the stencil thickness of 0.05 mm. The bonding strength of the MLCC chips was also directly related with the printability.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jalas, S.; Dornmair, I.; Lehe, R.

Particle in Cell (PIC) simulations are a widely used tool for the investigation of both laser- and beam-driven plasma acceleration. It is a known issue that the beam quality can be artificially degraded by numerical Cherenkov radiation (NCR) resulting primarily from an incorrectly modeled dispersion relation. Pseudo-spectral solvers featuring infinite order stencils can strongly reduce NCR - or even suppress it - and are therefore well suited to correctly model the beam properties. For efficient parallelization of the PIC algorithm, however, localized solvers are inevitable. Arbitrary order pseudo-spectral methods provide this needed locality. Yet, these methods can again be pronemore » to NCR. Here in this paper, we show that acceptably low solver orders are sufficient to correctly model the physics of interest, while allowing for parallel computation by domain decomposition.« less
76 FR 18294 - Proposed Agency Information Collection Activities; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-01

... learns the condition of operating rules and practices with respect to trains and instructions provided by... Existing On-Track Roadway Maintenance Machines Conforming with Paragraph (a) of This Section. 214.507... stencils. Maintenance Machine (RMM). 214.511--Clearly Identifiable 644 railroads...... 3,700 identified 5...
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Fabrication of poly (lactic-co-glycolic acid) microcontainers using solvent evaporation with polydimethylsiloxane stencil

NASA Astrophysics Data System (ADS)

Kim, Chul Min; Byul Lee, Han; Kim, Jong Uk; Kim, Gyu Man

2017-12-01

We present a fabrication method using polydimethylsiloxane (PDMS) stencils and solvent evaporation to prepare microcontainers with a desired shape made from a biodegradable polymer. Poly(lactic-co-glycolic acid) (PLGA) was used for preparing microcontainers, but most polymers are applicable in the proposed method in which solvent evaporation is used to construct microstructures in confined spaces in the stencil. Microcontainers with various shapes were fabricated by controlling the stencil geometry. Furthermore, a porous structure could be prepared in a micromembrane using water porogen. The porous structure was observed using a field emission scanning electron microscope and mass transfer across the porous membrane was examined using a fluorescent dye. The flexibility of the PDMS stencil allowed the fabrication of microcontainers on a curved surface. Finally, it was demonstrated that microcontainers can be used to contain a localized cell culture. The viability and morphology of cultured cells were observed using confocal microscopy over a period of 3 weeks.
A High-Resolution Capability for Large-Eddy Simulation of Jet Flows

NASA Technical Reports Server (NTRS)

DeBonis, James R.

2011-01-01

A large-eddy simulation (LES) code that utilizes high-resolution numerical schemes is described and applied to a compressible jet flow. The code is written in a general manner such that the accuracy/resolution of the simulation can be selected by the user. Time discretization is performed using a family of low-dispersion Runge-Kutta schemes, selectable from first- to fourth-order. Spatial discretization is performed using central differencing schemes. Both standard schemes, second- to twelfth-order (3 to 13 point stencils) and Dispersion Relation Preserving schemes from 7 to 13 point stencils are available. The code is written in Fortran 90 and uses hybrid MPI/OpenMP parallelization. The code is applied to the simulation of a Mach 0.9 jet flow. Four-stage third-order Runge-Kutta time stepping and the 13 point DRP spatial discretization scheme of Bogey and Bailly are used. The high resolution numerics used allows for the use of relatively sparse grids. Three levels of grid resolution are examined, 3.5, 6.5, and 9.2 million points. Mean flow, first-order turbulent statistics and turbulent spectra are reported. Good agreement with experimental data for mean flow and first-order turbulent statistics is shown.
76 FR 46892 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-03

... the Railroad Freight Car Safety Standards, specifically 49 CFR 215.303, which requires stenciling to indicate a restricted car. WTLC states that Caboose WTLC X-40 is operated as a shove platform on freight... requirements of its safety standards. The individual petition is described below, including the party seeking...

49 CFR 179.102-4 - Vinyl fluoride, stabilized.

Code of Federal Regulations, 2010 CFR

2010-10-01

...) Include impact specimens of weld metal and heat affected zone prepared and tested in accordance with AAR.... (b) Insulation must be of approved material. (c) Excess flow valves must be installed under all... capacity stencil, MINIMUM OPERATING TEMPERATURE _ °F. (i) The tank car and insulation must be designed to...
49 CFR 179.400-25 - Stenciling.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Specification for Cryogenic Liquid Tank Car Tanks and Seamless Steel Tanks (Classes DOT-113 and 107A) § 179.400... design service temperature and maximum lading weight, in letters and figures at least 11/2 inches high... at its coldest operating temperature, after deduction for the volume above the inlet to the pressure...
STENCIL: Science Teaching European Network for Creativity and Innovation in Learning

NASA Astrophysics Data System (ADS)

Cattadori, M.; Magrefi, F.

2013-12-01

STENCIL is an european educational project funded with support of the European Commission within the framework of LLP7 (Lifelong Learning Programme) for a period of 3 years (2011 - 2013). STENCIL includes 21 members from 9 European countries (Bulgaria, Germany, Greece, France, Italy, Malta, Portugal, Slovenia, Turkey.) working together to contribute to the general objective of improving science teaching, by promoting innovative methodologies and creative solutions. Among the innovative methods adept a particolar interest is a joint partnership between a wide spectrum of type of institutions such as schools, school authorities, research centres, universities, science museums, and other organizations, representing differing perspectives on science education. STENCIL offers to practitioners in science education from all over Europe, a platform; the web portal - www.stencil-science.eu - that provides high visibility to schools and institutions involved in Comenius and other similar European funded projects in science education. STENCIL takes advantage of the positive results achieved by the former European projects STELLA - Science Teaching in a Lifelong Learning Approach (2007 - 2009) and GRID - Growing interest in the development of teaching science (2004-2006). The specific objectives of the project are : 1) to identify and promote innovative practices in science teaching through the publication of Annual Reports on Science Education; 2) to bring together science education practitioners to share different experiences and learn from each other through the organisation of periodical study visits and workshops; 3) to disseminate materials and outcomes coming from previous EU funded projects and from isolated science education initiatives through the STENCIL web portal, as well as through international conferences and national events. This contribution aims at explaining the main features of the project together with the achieved results during the project's 3 year lifetime-span.
Accurate modeling of plasma acceleration with arbitrary order pseudo-spectral particle-in-cell methods

DOE PAGES

Jalas, S.; Dornmair, I.; Lehe, R.; ...

2017-03-20

Particle in Cell (PIC) simulations are a widely used tool for the investigation of both laser- and beam-driven plasma acceleration. It is a known issue that the beam quality can be artificially degraded by numerical Cherenkov radiation (NCR) resulting primarily from an incorrectly modeled dispersion relation. Pseudo-spectral solvers featuring infinite order stencils can strongly reduce NCR - or even suppress it - and are therefore well suited to correctly model the beam properties. For efficient parallelization of the PIC algorithm, however, localized solvers are inevitable. Arbitrary order pseudo-spectral methods provide this needed locality. Yet, these methods can again be pronemore » to NCR. Here in this paper, we show that acceptably low solver orders are sufficient to correctly model the physics of interest, while allowing for parallel computation by domain decomposition.« less
Local search to improve coordinate-based task mapping

DOE PAGES

Balzuweit, Evan; Bunde, David P.; Leung, Vitus J.; ...

2015-10-31

We present a local search strategy to improve the coordinate-based mapping of a parallel job’s tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job’s communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Utilizing the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtimemore » variability. Furthermore, we further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.« less
Effects of Stencil Width on Surface Ocean Geostrophic Velocity and Vorticity Estimation from Gridded Satellite Altimeter Data

DTIC Science & Technology

2012-03-17

Texas at Austin, Austin, Texas, USA. g dq ’Departement de Physique and LPO, Universite de Bretagne V _ /" r5r’ Occidental, Brest ...grid points are used in the calculation, so that the grid spacing is 8 times larger than on the original grid. The 3-point stencil differences are sig...that the difference between narrow and wide stencil estimates increases over that found on the original higher resolution grid. Interpolation of the
14. FACILITY IDENTIFICATION STENCILED ON ROOF BEAM, 'RIGGING LOFT' PORTION ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

14. FACILITY IDENTIFICATION STENCILED ON ROOF BEAM, 'RIGGING LOFT' PORTION OF BUILDING 4. - Chollas Heights Naval Radio Transmitting Facility, Public Works Shop, 6410 Zero Road, San Diego, San Diego County, CA
Boundary Closures for Fourth-order Energy Stable Weighted Essentially Non-Oscillatory Finite Difference Schemes

NASA Technical Reports Server (NTRS)

Fisher, Travis C.; Carpenter, Mark H.; Yamaleev, Nail K.; Frankel, Steven H.

2009-01-01

A general strategy exists for constructing Energy Stable Weighted Essentially Non Oscillatory (ESWENO) finite difference schemes up to eighth-order on periodic domains. These ESWENO schemes satisfy an energy norm stability proof for both continuous and discontinuous solutions of systems of linear hyperbolic equations. Herein, boundary closures are developed for the fourth-order ESWENO scheme that maintain wherever possible the WENO stencil biasing properties, while satisfying the summation-by-parts (SBP) operator convention, thereby ensuring stability in an L2 norm. Second-order, and third-order boundary closures are developed that achieve stability in diagonal and block norms, respectively. The global accuracy for the second-order closures is three, and for the third-order closures is four. A novel set of non-uniform flux interpolation points is necessary near the boundaries to simultaneously achieve 1) accuracy, 2) the SBP convention, and 3) WENO stencil biasing mechanics.
49 CFR 179.220-26 - Stenciling.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Specifications for Non-Pressure Tank Car Tanks (Classes DOT-111AW and 115AW) § 179.220-26 Stenciling. (a) The... high to indicate the safe upper temperature limit, if applicable, for the inner tank, insulation, and...
High density circuit technology, part 1

NASA Technical Reports Server (NTRS)

Wade, T. E.

1982-01-01

The metal (or dielectric) lift-off processes used in the semiconductor industry to fabricate high density very large scale integration (VLSI) systems were reviewed. The lift-off process consists of depositing the light-sensitive material onto the wafer and patterning first in such a manner as to form a stencil for the interconnection material. Then the interconnection layer is deposited and unwanted areas are lifted off by removing the underlying stencil. Several of these lift-off techniques were examined experimentally. The use of an auxiliary layer of polyimide to form a lift-off stencil offers considerable promise.
49 CFR 215.305 - Stenciling of maintenance-of-way equipment.

Code of Federal Regulations, 2011 CFR

2011-10-01

... RAILROAD ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling...” must be— (1) At least 2 inches high; and (2) Placed on each side of the car. [44 FR 77340, Dec. 31...
49 CFR 215.305 - Stenciling of maintenance-of-way equipment.

Code of Federal Regulations, 2014 CFR

2014-10-01

... RAILROAD ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling...” must be— (1) At least 2 inches high; and (2) Placed on each side of the car. [44 FR 77340, Dec. 31...
49 CFR 215.305 - Stenciling of maintenance-of-way equipment.

Code of Federal Regulations, 2010 CFR

2010-10-01

... RAILROAD ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling...” must be— (1) At least 2 inches high; and (2) Placed on each side of the car. [44 FR 77340, Dec. 31...
49 CFR 215.305 - Stenciling of maintenance-of-way equipment.

Code of Federal Regulations, 2013 CFR

2013-10-01

... RAILROAD ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling...” must be— (1) At least 2 inches high; and (2) Placed on each side of the car. [44 FR 77340, Dec. 31...
49 CFR 215.305 - Stenciling of maintenance-of-way equipment.

Code of Federal Regulations, 2012 CFR

2012-10-01

... RAILROAD ADMINISTRATION, DEPARTMENT OF TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling...” must be— (1) At least 2 inches high; and (2) Placed on each side of the car. [44 FR 77340, Dec. 31...
Simple and fast polydimethylsiloxane (PDMS) patterning using a cutting plotter and vinyl adhesives to achieve etching results.

PubMed

Hyun Kim; Sun-Young Yoo; Ji Sung Kim; Zihuan Wang; Woon Hee Lee; Kyo-In Koo; Jong-Mo Seo; Dong-Il Cho

2017-07-01

Inhibition of polydimethylsiloxane (PDMS) polymerization could be observed when spin-coated over vinyl substrates. The degree of polymerization, partially curing or fully curing, depended on the PDMS thickness coated over the vinyl substrate. This characteristic was exploited to achieve simple and fast PDMS patterning method using a vinyl adhesive layer patterned through a cutting plotter. The proposed patterning method showed results resembling PDMS etching. Therefore, patterning PDMS over PDMS, glass, silicon, and gold substrates were tested to compare the results with conventional etching methods. Vinyl stencils with widths ranging from 200μm to 1500μm were used for the procedure. To evaluate the accuracy of the cutting plotter, stencil designed on the AutoCAD software and the actual stencil widths were compared. Furthermore, this method's accuracy was also evaluated by comparing the widths of the actual stencils and etched PDMS results.
Cache Locality Optimization for Recursive Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less
A Beautiful End to a Day.

ERIC Educational Resources Information Center

Johns, Pat

2000-01-01

Focuses on a second grade art lesson (two 40-minute class periods) in which students use stencils, oil pastels, and watercolors to create an impressionistic landscape. Discusses how to create the stencils, using oaktags, and how to create the picture. (CMK)
CCC Stencil on center of east wall, interior of carpenter/blacksmith ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

CCC Stencil on center of east wall, interior of carpenter/blacksmith shop, facing east. - Camp Tulelake, Shop-Storage Building, West Side of Hill Road, 2 miles South of State Highway 161, Tulelake, Siskiyou County, CA
Piezoelectric Sol-Gel Composite Film Fabrication by Stencil Printing.

PubMed

Kaneko, Tsukasa; Iwata, Kazuki; Kobayashi, Makiko

2015-09-01

Piezoelectric films using sol-gel composites could be useful as ultrasonic transducers in various industrial fields. For sol-gel composite film fabrication, the spray coating technique has been used often because of its adaptability for various substrates. However, the spray technique requires multiple spray coating processes and heating processes and this is an issue of concern, especially for on-site fabrication in controlled areas. Stencil printing has been developed to solve this issue because this method can be used to fabricate thick sol-gel composite films with one coating process. In this study, PbTiO3 (PT)/Pb(Zr,Ti)O3 (PZT) films, PZT/PZT films, and Bi4Ti3O12 (BiT)/PZT films were fabricated by stencil printing, and PT/ PZT films were also fabricated using the spray technique. After fabrication, a thermal cycle test was performed for the samples to compare their ultrasonic performance. The sensitivity and signal-to-noise-ratio (SNR) of the ultrasonic response of PT/PZT fabricated by stencil printing were equivalent to those of PT/PZT fabricated by the spray technique, and better than those of other samples between room temperature and 300°C. Therefore, PT/PZT films fabricated by stencil printing could be a good candidate for nondestructive testing (NDT) ultrasonic transducers from room temperature to 300°C.

49 CFR 179.220-26 - Stenciling.

Code of Federal Regulations, 2012 CFR

2012-10-01

... Transportation Other Regulations Relating to Transportation (Continued) PIPELINE AND HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED) SPECIFICATIONS FOR TANK CARS Specifications for Non-Pressure Tank Car Tanks (Classes DOT-111AW and 115AW) § 179.220-26 Stenciling. (a) The outer shell, or the...
76 FR 818 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-06

... 215, Railroad Freight Car Safety Standards, specifically, 49 CFR 215.303 (Stenciling of Restricted Cars), which requires that restricted railroad freight cars shall be stenciled or marked in clearly... requirements of its safety standards. The individual petition is described below, including the party seeking...
49 CFR 179.220-26 - Stenciling.

Code of Federal Regulations, 2011 CFR

2011-10-01

... Transportation Other Regulations Relating to Transportation (Continued) PIPELINE AND HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED) SPECIFICATIONS FOR TANK CARS Specifications for Non-Pressure Tank Car Tanks (Classes DOT-111AW and 115AW) § 179.220-26 Stenciling. (a) The outer shell, or the...
The chronology of hand stencils in European Palaeolithic rock art: implications of new U-series results from El Castillo Cave (Cantabria, Spain).

PubMed

García-Diez, Marcos; Garrido, Daniel; Hoffmann, Dirk; Pettitt, Paul; Pike, Alistair; Zilhão, Joao

2015-07-20

The hand stencils of European Paleolithic art tend to be considered of pre-Magdalenian age and scholars have generally assigned them to the Gravettian period. At El Castillo Cave, application of U-series dating to calcite accretions has established a minimum age of 37,290 years for underlying red hand stencils, implying execution in the earlier part of the Aurignacian if not beforehand. Together with the series of red disks, one of which has a minimum age of 40,800 years, these motifs lie at the base of the El Castillo parietal stratigraphy. The similarity in technique and colour support the notion that both kinds of artistic manifestations are synchronic and define an initial, non-figurative phase of European cave art. However, available data indicate that hand stencils continued to be painted subsequently. Currently, the youngest, reliably dated examples fall in the Late Gravettian, approximately 27,000 years ago.
Interference Lattice-based Loop Nest Tilings for Stencil Computations

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Frumkin, Michael

2000-01-01

A common method for improving performance of stencil operations on structured multi-dimensional discretization grids is loop tiling. Tile shapes and sizes are usually determined heuristically, based on the size of the primary data cache. We provide a lower bound on the numbers of cache misses that must be incurred by any tiling, and a close achievable bound using a particular tiling based on the grid interference lattice. The latter tiling is used to derive highly efficient loop orderings. The total number of cache misses of a code is the sum of (necessary) cold misses and misses caused by elements being dropped from the cache between successive loads (replacement misses). Maximizing temporal locality is equivalent to minimizing replacement misses. Temporal locality of loop nests implementing stencil operations is optimized by tilings that avoid data conflicts. We divide the loop nest iteration space into conflict-free tiles, derived from the cache miss equation. The tiling involves the definition of the grid interference lattice an equivalence class of grid points whose images in main memory map to the same location in the cache-and the construction of a special basis for the lattice. Conflicts only occur on the boundaries of the tiles, unless the tiles are too thin. We show that the surface area of the tiles is bounded for grids of any dimensionality, and for caches of any associativity, provided the eccentricity of the fundamental parallelepiped (the tile spanned by the basis) of the lattice is bounded. Eccentricity is determined by two factors, aspect ratio and skewness. The aspect ratio of the parallelepiped can be bounded by appropriate array padding. The skewness can be bounded by the choice of a proper basis. Combining these two strategies ensures that pathologically thin tiles are avoided. They do not, however, minimize replacement misses per se. The reason is that tile visitation order influences the number of data conflicts on the tile boundaries. If two adjacent tiles are visited successively, there will be no replacement misses on the shared boundary. The iteration space may be covered with pencils larger than the size of the cache while avoiding data conflicts if the pencils are traversed by a scanning-face method. Replacement misses are incurred only on the boundaries of the pencils, and the number of misses is minimized by maximizing the volume of the scanning face, not the volume of the tile. We present an algorithm for constructing the most efficient scanning face for a given grid and stencil operator. In two dimensions it is based on a continued fraction algorithm. In three dimensions it follows Voronoi's successive minima algorithm. We show experimental results of using the scanning face, and compare with canonical loop orderings.
77 FR 68883 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2012-11-16

... compliance from the Railroad Freight Car Safety Standards contained in 49 CFR 215.303-Stenciling of restricted cars, which requires stenciling on restricted freight cars with a clearly legible letter ``R... waiver of compliance from certain provisions of the Federal railroad safety regulations contained at 49...
49 CFR 215.301 - General.

Code of Federal Regulations, 2014 CFR

2014-10-01

... TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.301 General. The railroad or private car owner reporting mark, the car number, and built date shall be stenciled, or otherwise displayed, in... shall not be less than one inch high: (a) On each side of each railroad freight car body; and (b) In the...
49 CFR 215.301 - General.

Code of Federal Regulations, 2012 CFR

2012-10-01

... TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.301 General. The railroad or private car owner reporting mark, the car number, and built date shall be stenciled, or otherwise displayed, in... shall not be less than one inch high: (a) On each side of each railroad freight car body; and (b) In the...
49 CFR 215.301 - General.

Code of Federal Regulations, 2011 CFR

2011-10-01

... TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.301 General. The railroad or private car owner reporting mark, the car number, and built date shall be stenciled, or otherwise displayed, in... shall not be less than one inch high: (a) On each side of each railroad freight car body; and (b) In the...
49 CFR 215.301 - General.

Code of Federal Regulations, 2010 CFR

2010-10-01

... TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.301 General. The railroad or private car owner reporting mark, the car number, and built date shall be stenciled, or otherwise displayed, in... shall not be less than one inch high: (a) On each side of each railroad freight car body; and (b) In the...
49 CFR 215.301 - General.

Code of Federal Regulations, 2013 CFR

2013-10-01

... TRANSPORTATION RAILROAD FREIGHT CAR SAFETY STANDARDS Stenciling § 215.301 General. The railroad or private car owner reporting mark, the car number, and built date shall be stenciled, or otherwise displayed, in... shall not be less than one inch high: (a) On each side of each railroad freight car body; and (b) In the...
The numerical modelling of MHD astrophysical flows with chemistry

NASA Astrophysics Data System (ADS)

Kulikov, I.; Chernykh, I.; Protasov, V.

2017-10-01

The new code for numerical simulation of magnetic hydrodynamical astrophysical flows with consideration of chemical reactions is given in the paper. At the heart of the code - the new original low-dissipation numerical method based on a combination of operator splitting approach and piecewise-parabolic method on the local stencil. The chemodynamics of the hydrogen while the turbulent formation of molecular clouds is modeled.
Implementing Connected Component Labeling as a User Defined Operator for SciDB

NASA Technical Reports Server (NTRS)

Oloso, Amidu; Kuo, Kwo-Sen; Clune, Thomas; Brown, Paul; Poliakov, Alex; Yu, Hongfeng

2016-01-01

We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficiently manipulates arrays whose memory requirements far exceed available physical memory. The UDO takes as primary inputs a binary mask array and a binary stencil array that specifies the connectivity of a given cell to its neighbors. The UDO returns an array of the same shape as the input mask array with each foreground cell containing the label of the component it belongs to. By default, dimensions are treated as non-periodic, but the UDO also accepts optional input parameters to specify periodicity in any of the array dimensions. The UDO requires four stages to completely label connected components. In the first stage, labels are computed for each subarray or chunk of the mask array in parallel across SciDB instances using the weighted quick union (WQU) with half-path compression algorithm. In the second stage, labels around chunk boundaries from the first stage are stored in a temporary SciDB array that is then replicated across all SciDB instances. Equivalences are resolved by again applying the WQU algorithm to these boundary labels. In the third stage, relabeling is done for each chunk using the resolved equivalences. In the fourth stage, the resolved labels, which so far are "flattened" coordinates of the original binary mask array, are renamed with sequential integers for legibility. The UDO is demonstrated on a 3-D mask of O(1011) elements, with O(108) foreground cells and O(106) connected components. The operator completes in 19 minutes using 84 SciDB instances.
Rapid Stencil Mask Fabrication Enabled One-Step Polymer-Free Graphene Patterning and Direct Transfer for Flexible Graphene Devices

PubMed Central

Yong, Keong; Ashraf, Ali; Kang, Pilgyu; Nam, SungWoo

2016-01-01

We report a one-step polymer-free approach to patterning graphene using a stencil mask and oxygen plasma reactive-ion etching, with a subsequent polymer-free direct transfer for flexible graphene devices. Our stencil mask is fabricated via a subtractive, laser cutting manufacturing technique, followed by lamination of stencil mask onto graphene grown on Cu foil for patterning. Subsequently, micro-sized graphene features of various shapes are patterned via reactive-ion etching. The integrity of our graphene after patterning is confirmed by Raman spectroscopy. We further demonstrate the rapid prototyping capability of a stretchable, crumpled graphene strain sensor and patterned graphene condensation channels for potential applications in sensing and heat transfer, respectively. We further demonstrate that the polymer-free approach for both patterning and transfer to flexible substrates allows the realization of cleaner graphene features as confirmed by water contact angle measurements. We believe that our new method promotes rapid, facile fabrication of cleaner graphene devices, and can be extended to other two dimensional materials in the future. PMID:27118249
Fabrication of Quench Condensed Thin Films Using an Integrated MEMS Fab on a Chip

NASA Astrophysics Data System (ADS)

Lally, Richard; Reeves, Jeremy; Stark, Thomas; Barrett, Lawrence; Bishop, David

Atomic calligraphy is a microelectromechanical systems (MEMS)-based dynamic stencil nanolithography technique. Integrating MEMS devices into a bonded stacked array of three die provides a unique platform for conducting quench condensed thin film mesoscopic experiments. The atomic calligraphy Fab on a Chip process incorporates metal film sources, electrostatic comb driven stencil plate, mass sensor, temperature sensor, and target surface into one multi-die assembly. Three separate die are created using the PolyMUMPs process and are flip-chip bonded together. A die containing joule heated sources must be prepared with metal for evaporation prior to assembly. A backside etch of the middle/central die exposes the moveable stencil plate allowing the flux to pass through the stencil from the source die to the target die. The chip assembly is mounted in a cryogenic system at ultra-high vacuum for depositing extremely thin films down to single layers of atoms across targeted electrodes. Experiments such as the effect of thin film alloys or added impurities on their superconductivity can be measured in situ with this process.
Using Flux Site Observations to Calibrate Root System Architecture Stencils for Water Uptake of Plant Functional Types in Land Surface Models.

NASA Astrophysics Data System (ADS)

Bouda, M.

2017-12-01

Root system architecture (RSA) can significantly affect plant access to water, total transpiration, as well as its partitioning by soil depth, with implications for surface heat, water, and carbon budgets. Despite recent advances in land surface model (LSM) descriptions of plant hydraulics, RSA has not been included because of its three-dimensional complexity, which makes RSA modelling generally too computationally costly. This work builds upon the recently introduced "RSA stencil," a process-based 1D layered model that captures the dynamic shifts in water potential gradients of 3D RSA in response to heterogeneous soil moisture profiles. In validations using root systems calibrated to the rooting profiles of four plant functional types (PFT) of the Community Land Model, the RSA stencil predicts plant water potentials within 2% of the outputs of full 3D models, despite its trivial computational cost. In transient simulations, the RSA stencil yields improved predictions of water uptake and soil moisture profiles compared to a 1D model based on root fraction alone. Here I show how the RSA stencil can be calibrated to time-series observations of soil moisture and transpiration to yield a water uptake PFT definition for use in terrestrial models. This model-data integration exercise aims to improve LSM predictions of soil moisture dynamics and, under water-limiting conditions, surface fluxes. These improvements can be expected to significantly impact predictions of downstream variables, including surface fluxes, climate-vegetation feedbacks and soil nutrient cycling.
Reduction of the discretization stencil of direct forcing immersed boundary methods on rectangular cells: The ghost node shifting method

NASA Astrophysics Data System (ADS)

Picot, Joris; Glockner, Stéphane

2018-07-01

We present an analytical study of discretization stencils for the Poisson problem and the incompressible Navier-Stokes problem when used with some direct forcing immersed boundary methods. This study uses, but is not limited to, second-order discretization and Ghost-Cell Finite-Difference methods. We show that the stencil size increases with the aspect ratio of rectangular cells, which is undesirable as it breaks assumptions of some linear system solvers. To circumvent this drawback, a modification of the Ghost-Cell Finite-Difference methods is proposed to reduce the size of the discretization stencil to the one observed for square cells, i.e. with an aspect ratio equal to one. Numerical results validate this proposed method in terms of accuracy and convergence, for the Poisson problem and both Dirichlet and Neumann boundary conditions. An improvement on error levels is also observed. In addition, we show that the application of the chosen Ghost-Cell Finite-Difference methods to the Navier-Stokes problem, discretized by a pressure-correction method, requires an additional interpolation step. This extra step is implemented and validated through well known test cases of the Navier-Stokes equations.
CD-measurement technique for hole patterns on stencil mask

NASA Astrophysics Data System (ADS)

Ishikawa, Mikio; Yusa, Satoshi; Takikawa, Tadahiko; Fujita, Hiroshi; Sano, Hisatake; Hoga, Morihisa; Hayashi, Naoya

2004-12-01

EB lithography has a potential to successfully form hole patterns as small as 80 nm with a stencil mask. In a previous paper we proposed a technique using a HOLON dual-mode critical dimension (CD) SEM ESPA-75S in the transmission mode for CD measurement of line-and-space patterns on a stencil mask. In this paper we extend our effort of developing a CD measurement technique to contact hole features and determine it in comparison of measured values between features on mask and those printed on wafer. We have evaluated the width method and the area methods using designed 80-500 nm wide contact hole patterns on a large area membrane mask and their resist images on wafer printed by a LEEPL3000. We find that 1) the width method and the area methods show an excellent mask-wafer correlation for holes over 110 nm, and 2) the area methods show a better mask-wafer correlation than the width method does for holes below 110 nm. We conclude that the area calculated from the transmission SEM image is more suitable in defining the hole dimensions than the width for contact holes on a stencil mask.
Separation of distinct adhesion complexes and associated cytoskeleton by a micro-stencil-printing method.

PubMed

Caballero, David; Osmani, Naël; Georges-Labouesse, Elisabeth; Labouesse, Michel; Riveline, Daniel

2012-01-01

Adhesion between cells and the extracellular matrix is mediated by different types of transmembraneous proteins. Their associations to specific partners lead to the assembly of contacts such as focal adhesions and hemidesmosomes. The spatial overlap between both contacts within cells has however limited the study of each type of contact. Here we show that with "stampcils" focal contacts and hemidesmosomes can be spatially separated: cells are plated within the cavities of a stencil and the grids of the stencil serve as stamps for grafting an extracellular matrix protein-fibronectin. Cells engage new contacts on stamped zones leading to the segregation of adhesions and their associated cytoskeletons, i.e., actin and intermediate filaments of keratins. This new method should provide new insights into cell contacts compositions and dynamics.
Practical proof of CP element based design for 14nm node and beyond

NASA Astrophysics Data System (ADS)

Maruyama, Takashi; Takita, Hiroshi; Ikeno, Rimon; Osawa, Morimi; Kojima, Yoshinori; Sugatani, Shinji; Hoshino, Hiromi; Hino, Toshio; Ito, Masaru; Iizuka, Tetsuya; Komatsu, Satoshi; Ikeda, Makoto; Asada, Kunihiro

2013-03-01

To realize HVM (High Volume Manufacturing) with CP (Character Projection) based EBDW, the shot count reduction is the essential key. All device circuits should be composed with predefined character parts and we call this methodology "CP element based design". In our previous work, we presented following three concepts [2]. 1) Memory: We reported the prospects of affordability for the CP-stencil resource. 2) Logic cell: We adopted a multi-cell clustering approach in the physical synthesis. 3) Random interconnect: We proposed an ultra-regular layout scheme using fixed size wiring tiles containing repeated tracks and cutting points at the tile edges. In this paper, we will report the experimental proofs in these methodologies. In full chip layout, CP stencil resource management is critical key. From the MCC-POC (Proof of Concept) result [1], we assumed total available CP stencil resource as 9000um2. We should manage to layout all circuit macros within this restriction. Especially the issues in assignment of CP-stencil resource for the memory macros are the most important as they consume considerable degree of resource because of the various line-ups such as 1RW-, 2RW-SRAMs, Resister Files and ROM which require several varieties of large size peripheral circuits. Furthermore the memory macros typically take large area of more than 40% of die area in the forefront logic LSI products so that the shot count increase impact is serious. To realize CP-stencil resource saving we had constructed automatic CP analyzing system. We developed two types of extraction mode of simple division by block and layout repeatability recognition. By properly controlling these models based upon each peripheral circuit characteristics, we could minimize the consumption of CP stencil resources. The estimation for 14nm technology node had been performed based on the analysis of practical memory compiler. The required resource for memory macro is proved to be affordable value which is 60% of full CP stencil resource and wafer level converted shot count is proved to be the level which meets 100WPH throughput. In logic cell design, circuit performance verification result after the cell clustering has been estimated. The cell clustering by the acknowledgment of physical distance proved to owe large penalty mainly in the wiring length. To reduce this design penalty, we proposed CP cell clustering by the acknowledgment of logical distance. For shot-count reduction of random interconnect area design, we proposed a more structural routing architecture which consists of the track exchange and the via position arrangement. Putting these design approaches together, we can design CP stencils to hit the target throughput within the area constraint. From the analysis for other macros such as analog, I/O, and DUMMY, it has proved that we don't need special CP design approach than legacy pattern matching CP extraction. From all these experimental results we get good prospects to the reality of full CP element based layout.

Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Easy Fabrication of Thin Membranes with Through Holes. Application to Protein Patterning

PubMed Central

Arasi, Bakya; Gauthier, Nils; Viasnoff, Virgile

2012-01-01

Since protein patterning on 2D surfaces has emerged as an important tool in cell biology, the development of easy patterning methods has gained importance in biology labs. In this paper we present a simple, rapid and reliable technique to fabricate thin layers of UV curable polymer with through holes. These membranes are as easy to fabricate as microcontact printing stamps and can be readily used for stencil patterning. We show how this microfabrication scheme allows highly reproducible and highly homogeneous protein patterning with micron sized resolution on surfaces as large as 10 cm2. Using these stencils, fragile proteins were patterned without loss of function in a fully hydrated state. We further demonstrate how intricate patterns of multiple proteins can be achieved by stacking the stencil membranes. We termed this approach microserigraphy. PMID:22952944
Maskless micro-ion-beam reduction lithography system

DOEpatents

Leung, Ka-Ngo; Barletta, William A.; Patterson, David O.; Gough, Richard A.

2005-05-03

A maskless micro-ion-beam reduction lithography system is a system for projecting patterns onto a resist layer on a wafer with feature size down to below 100 nm. The MMRL system operates without a stencil mask. The patterns are generated by switching beamlets on and off from a two electrode blanking system or pattern generator. The pattern generator controllably extracts the beamlet pattern from an ion source and is followed by a beam reduction and acceleration column.
Evaluating print performance of Sn-Ag-Cu lead-free solder pastes used in electronics assembly process

NASA Astrophysics Data System (ADS)

Mallik, S.; Bauer, R.; Hübner, F.; Ekere, N. N.

2011-01-01

Solder paste is the most widely used interconnection material in the electronic assembly process for attaching electronic components/devices directly onto the surface of printed circuit boards, using stencil printing process. This paper evaluates the performance of three different commercially available Sn-Ag-Cu solder pastes formulated with different particle size distributions (PSD), metal content and alloy composition. A series of stencil printing tests were carried out using a specially designed stencil of 75 μm thickness and apertures of 300×300 μm2 dimension and 500 μm pitch sizes. Solder paste printing behaviors were found related to attributes such as slumping and surface tension and printing performance was correlated with metal content and PSD. The results of the study should benefit paste manufacturers and SMT assemblers to improve their products and practices.
Realistic Fireteam Movement in Urban Environments

DTIC Science & Technology

2010-10-01

00-2010 4 . TITLE AND SUBTITLE Realistic Fireteam Movement in Urban Environments 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER...is largely consumed by the data transfer from the GPU to the CPU of the color and stencil buffers. Since this operation would only need to be...cost is given in table 4 . Waypoints Mean Std Dev 1112 1.25ms 0.09ms 3785 4.07ms 0.20ms Table 4 : Threat Probability Model update cost (Intel Q6600
Assessing the manufacturing tolerances and uniformity of CMOS compatible metamaterial fabrication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Musick, Katherine M.; Wendt, Joel R.; Resnick, Paul J.

Here, the manufacturing tolerances of a stencil-lithography variant, membrane projection lithography, were investigated. In the first part of this work, electron beam lithography was used to create stencils with a range of linewidths. These patterns were transferred into the stencil membrane and used to pattern metallic lines on vertical silicon faces. Only the largest lines, with a nominal width of 84 nm, were resolved, resulting in 45 ± 10 nm (average ± standard deviation) as deposited with 135-nm spacing. Although written in the e-beam write software file as 84-nm in width, the lines exhibited linewidth bias. This can largely bemore » attributed to nonvertical sidewalls inherent to dry etching techniques that cause proportionally larger impact with decreasing feature size. The line edge roughness can be significantly attributed to the grain structure of the aluminum nitride stencil membrane. In the second part of this work, the spatial uniformity of optically defined (as opposed to e-beam written) metamaterial structures over large areas was assessed. A Fourier transform infrared spectrometer microscope was used to collect the reflection spectra of samples with optically defined vertical split ring from 25 spatially resolved 300 × 300 μm regions in a 1-cm 2 area. The technique is shown to provide a qualitative measure of the uniformity of the inclusions.« less
Assessing the manufacturing tolerances and uniformity of CMOS compatible metamaterial fabrication

DOE PAGES

Musick, Katherine M.; Wendt, Joel R.; Resnick, Paul J.; ...

2018-01-18

Here, the manufacturing tolerances of a stencil-lithography variant, membrane projection lithography, were investigated. In the first part of this work, electron beam lithography was used to create stencils with a range of linewidths. These patterns were transferred into the stencil membrane and used to pattern metallic lines on vertical silicon faces. Only the largest lines, with a nominal width of 84 nm, were resolved, resulting in 45 ± 10 nm (average ± standard deviation) as deposited with 135-nm spacing. Although written in the e-beam write software file as 84-nm in width, the lines exhibited linewidth bias. This can largely bemore » attributed to nonvertical sidewalls inherent to dry etching techniques that cause proportionally larger impact with decreasing feature size. The line edge roughness can be significantly attributed to the grain structure of the aluminum nitride stencil membrane. In the second part of this work, the spatial uniformity of optically defined (as opposed to e-beam written) metamaterial structures over large areas was assessed. A Fourier transform infrared spectrometer microscope was used to collect the reflection spectra of samples with optically defined vertical split ring from 25 spatially resolved 300 × 300 μm regions in a 1-cm 2 area. The technique is shown to provide a qualitative measure of the uniformity of the inclusions.« less
A fourth-order Cartesian grid embeddedboundary method for Poisson’s equation

DOE PAGES

Devendran, Dharshi; Graves, Daniel; Johansen, Hans; ...

2017-05-08

In this paper, we present a fourth-order algorithm to solve Poisson's equation in two and three dimensions. We use a Cartesian grid, embedded boundary method to resolve complex boundaries. We use a weighted least squares algorithm to solve for our stencils. We use convergence tests to demonstrate accuracy and we show the eigenvalues of the operator to demonstrate stability. We compare accuracy and performance with an established second-order algorithm. We also discuss in depth strategies for retaining higher-order accuracy in the presence of nonsmooth geometries.
A fourth-order Cartesian grid embeddedboundary method for Poisson’s equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devendran, Dharshi; Graves, Daniel; Johansen, Hans

In this paper, we present a fourth-order algorithm to solve Poisson's equation in two and three dimensions. We use a Cartesian grid, embedded boundary method to resolve complex boundaries. We use a weighted least squares algorithm to solve for our stencils. We use convergence tests to demonstrate accuracy and we show the eigenvalues of the operator to demonstrate stability. We compare accuracy and performance with an established second-order algorithm. We also discuss in depth strategies for retaining higher-order accuracy in the presence of nonsmooth geometries.
High-order central ENO finite-volume scheme for hyperbolic conservation laws on three-dimensional cubed-sphere grids

NASA Astrophysics Data System (ADS)

Ivan, L.; De Sterck, H.; Susanto, A.; Groth, C. P. T.

2015-02-01

A fourth-order accurate finite-volume scheme for hyperbolic conservation laws on three-dimensional (3D) cubed-sphere grids is described. The approach is based on a central essentially non-oscillatory (CENO) finite-volume method that was recently introduced for two-dimensional compressible flows and is extended to 3D geometries with structured hexahedral grids. Cubed-sphere grids feature hexahedral cells with nonplanar cell surfaces, which are handled with high-order accuracy using trilinear geometry representations in the proposed approach. Varying stencil sizes and slope discontinuities in grid lines occur at the boundaries and corners of the six sectors of the cubed-sphere grid where the grid topology is unstructured, and these difficulties are handled naturally with high-order accuracy by the multidimensional least-squares based 3D CENO reconstruction with overdetermined stencils. A rotation-based mechanism is introduced to automatically select appropriate smaller stencils at degenerate block boundaries, where fewer ghost cells are available and the grid topology changes, requiring stencils to be modified. Combining these building blocks results in a finite-volume discretization for conservation laws on 3D cubed-sphere grids that is uniformly high-order accurate in all three grid directions. While solution-adaptivity is natural in the multi-block setting of our code, high-order accurate adaptive refinement on cubed-sphere grids is not pursued in this paper. The 3D CENO scheme is an accurate and robust solution method for hyperbolic conservation laws on general hexahedral grids that is attractive because it is inherently multidimensional by employing a K-exact overdetermined reconstruction scheme, and it avoids the complexity of considering multiple non-central stencil configurations that characterizes traditional ENO schemes. Extensive numerical tests demonstrate fourth-order convergence for stationary and time-dependent Euler and magnetohydrodynamic flows on cubed-sphere grids, and robustness against spurious oscillations at 3D shocks. Performance tests illustrate efficiency gains that can be potentially achieved using fourth-order schemes as compared to second-order methods for the same error level. Applications on extended cubed-sphere grids incorporating a seventh root block that discretizes the interior of the inner sphere demonstrate the versatility of the spatial discretization method.
Raman microscopy of hand stencils rock art from the Yabrai Mountain, Inner Mongolia Autonomous Region, China

NASA Astrophysics Data System (ADS)

Hernanz, Antonio; Chang, Jinlong; Iriarte, Mercedes; Gavira-Vallejo, Jose M.; de Balbín-Behrmann, Rodrigo; Bueno-Ramírez, Primitiva; Maroto-Valiente, Angel

2016-07-01

A series of rock art pictographs in the form of hand stencils discovered in two sites of the Yabrai Mountain, Inner Mongolia Autonomous Region (China) has been studied by micro-Raman spectroscopy, X-ray photoelectron spectroscopy and scanning electronic microscopy combined with energy dispersive X-ray spectroscopy for the first time. These studies have made possible to characterise the materials present. The minerals α-quartz, phlogopite, albite and microcline have been identified in the granitic rocks supporting the paintings. Calcite and dolomite micro-particles detected on the rock surface have been attributed to desert dust. Accretions of gypsum, anhydrite and whewellite have also been identified on the rock surface. Haematite is the pigment used in the red pictographs, whereas well-crystallised graphite has been used in the black ones. The use of crystalline graphite instead of amorphous carbon (charcoal, soot or bone black) as a black pigment in rock art is an interesting novelty. Overlapped hands are proposed as a new type of hand stencils to make an unusual pictorial symbol in rock art that has been found in these sites.
Discretely Conservative Finite-Difference Formulations for Nonlinear Conservation Laws in Split Form: Theory and Boundary Conditions

NASA Technical Reports Server (NTRS)

Fisher, Travis C.; Carpenter, Mark H.; Nordstroem, Jan; Yamaleev, Nail K.; Swanson, R. Charles

2011-01-01

Simulations of nonlinear conservation laws that admit discontinuous solutions are typically restricted to discretizations of equations that are explicitly written in divergence form. This restriction is, however, unnecessary. Herein, linear combinations of divergence and product rule forms that have been discretized using diagonal-norm skew-symmetric summation-by-parts (SBP) operators, are shown to satisfy the sufficient conditions of the Lax-Wendroff theorem and thus are appropriate for simulations of discontinuous physical phenomena. Furthermore, special treatments are not required at the points that are near physical boundaries (i.e., discrete conservation is achieved throughout the entire computational domain, including the boundaries). Examples are presented of a fourth-order, SBP finite-difference operator with second-order boundary closures. Sixth- and eighth-order constructions are derived, and included in E. Narrow-stencil difference operators for linear viscous terms are also derived; these guarantee the conservative form of the combined operator.
Self-organized broadband light trapping in thin film amorphous silicon solar cells.

PubMed

Martella, C; Chiappe, D; Delli Veneri, P; Mercaldo, L V; Usatii, I; Buatier de Mongeot, F

2013-06-07

Nanostructured glass substrates endowed with high aspect ratio one-dimensional corrugations are prepared by defocused ion beam erosion through a self-organized gold (Au) stencil mask. The shielding action of the stencil mask is amplified by co-deposition of gold atoms during ion bombardment. The resulting glass nanostructures enable broadband anti-reflection functionality and at the same time ensure a high efficiency for diffuse light scattering (Haze). It is demonstrated that the patterned glass substrates exhibit a better photon harvesting than the flat glass substrate in p-i-n type thin film a-Si:H solar cells.
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

DOE PAGES

Mendis, Charith; Bosboom, Jeffrey; Wu, Kevin; ...

2015-06-03

Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regeneratemore » the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.« less
Hybrid multicore/vectorisation technique applied to the elastic wave equation on a staggered grid

NASA Astrophysics Data System (ADS)

Titarenko, Sofya; Hildyard, Mark

2017-07-01

In modern physics it has become common to find the solution of a problem by solving numerically a set of PDEs. Whether solving them on a finite difference grid or by a finite element approach, the main calculations are often applied to a stencil structure. In the last decade it has become usual to work with so called big data problems where calculations are very heavy and accelerators and modern architectures are widely used. Although CPU and GPU clusters are often used to solve such problems, parallelisation of any calculation ideally starts from a single processor optimisation. Unfortunately, it is impossible to vectorise a stencil structured loop with high level instructions. In this paper we suggest a new approach to rearranging the data structure which makes it possible to apply high level vectorisation instructions to a stencil loop and which results in significant acceleration. The suggested method allows further acceleration if shared memory APIs are used. We show the effectiveness of the method by applying it to an elastic wave propagation problem on a finite difference grid. We have chosen Intel architecture for the test problem and OpenMP (Open Multi-Processing) since they are extensively used in many applications.
Growth Of Organic Semiconductor Thin Films with Multi-Micron Domain Size and Fabrication of Organic Transistors Using a Stencil Nanosieve.

PubMed

Fesenko, Pavlo; Flauraud, Valentin; Xie, Shenqi; Kang, Enpu; Uemura, Takafumi; Brugger, Jürgen; Genoe, Jan; Heremans, Paul; Rolin, Cédric

2017-07-19

To grow small molecule semiconductor thin films with domain size larger than modern-day device sizes, we evaporate the material through a dense array of small apertures, called a stencil nanosieve. The aperture size of 0.5 μm results in low nucleation density, whereas the aperture-to-aperture distance of 0.5 μm provides sufficient crosstalk between neighboring apertures through the diffusion of adsorbed molecules. By integrating the nanosieve in the channel area of a thin-film transistor mask, we show a route for patterning both the organic semiconductor and the metal contacts of thin-film transistors using one mask only and without mask realignment.
A new third order finite volume weighted essentially non-oscillatory scheme on tetrahedral meshes

NASA Astrophysics Data System (ADS)

Zhu, Jun; Qiu, Jianxian

2017-11-01

In this paper a third order finite volume weighted essentially non-oscillatory scheme is designed for solving hyperbolic conservation laws on tetrahedral meshes. Comparing with other finite volume WENO schemes designed on tetrahedral meshes, the crucial advantages of such new WENO scheme are its simplicity and compactness with the application of only six unequal size spatial stencils for reconstructing unequal degree polynomials in the WENO type spatial procedures, and easy choice of the positive linear weights without considering the topology of the meshes. The original innovation of such scheme is to use a quadratic polynomial defined on a big central spatial stencil for obtaining third order numerical approximation at any points inside the target tetrahedral cell in smooth region and switch to at least one of five linear polynomials defined on small biased/central spatial stencils for sustaining sharp shock transitions and keeping essentially non-oscillatory property simultaneously. By performing such new procedures in spatial reconstructions and adopting a third order TVD Runge-Kutta time discretization method for solving the ordinary differential equation (ODE), the new scheme's memory occupancy is decreased and the computing efficiency is increased. So it is suitable for large scale engineering requirements on tetrahedral meshes. Some numerical results are provided to illustrate the good performance of such scheme.
Proximity Effect Correction by Pattern Modified Stencil Mask in Large-Field Projection Electron-Beam Lithography

NASA Astrophysics Data System (ADS)

Kobinata, Hideo; Yamashita, Hiroshi; Nomura, Eiichi; Nakajima, Ken; Kuroki, Yukinori

1998-12-01

A new method for proximity effect correction, suitable for large-field electron-beam (EB) projection lithography with high accelerating voltage, such as SCALPEL and PREVAIL in the case where a stencil mask is used, is discussed. In this lithography, a large-field is exposed by the same dose, and thus, the dose modification method, which is used in the variable-shaped beam and the cell projection methods, cannot be used in this case. In this study, we report on development of a new proximity effect correction method which uses a pattern modified stencil mask suitable for high accelerating voltage and large-field EB projection lithography. In order to obtain the mask bias value, we have investigated linewidth reduction, due to the proximity effect, in the peripheral memory cell area, and found that it could be expressed by a simple function and all the correction parameters were easily determined from only the mask pattern data. The proximity effect for the peripheral array pattern could also be corrected by considering the pattern density. Calculated linewidth deviation was 3% or less for a 0.07-µm-L/S memory cell pattern and 5% or less for a 0.14-µm-line and 0.42-µm-space peripheral array pattern, simultaneously.
Geometrically Flexible and Efficient Flow Analysis of High Speed Vehicles Via Domain Decomposition, Part 1: Unstructured-Grid Solver for High Speed Flows

NASA Technical Reports Server (NTRS)

White, Jeffery A.; Baurle, Robert A.; Passe, Bradley J.; Spiegel, Seth C.; Nishikawa, Hiroaki

2017-01-01

The ability to solve the equations governing the hypersonic turbulent flow of a real gas on unstructured grids using a spatially-elliptic, 2nd-order accurate, cell-centered, finite-volume method has been recently implemented in the VULCAN-CFD code. This paper describes the key numerical methods and techniques that were found to be required to robustly obtain accurate solutions to hypersonic flows on non-hex-dominant unstructured grids. The methods and techniques described include: an augmented stencil, weighted linear least squares, cell-average gradient method, a robust multidimensional cell-average gradient-limiter process that is consistent with the augmented stencil of the cell-average gradient method and a cell-face gradient method that contains a cell skewness sensitive damping term derived using hyperbolic diffusion based concepts. A data-parallel matrix-based symmetric Gauss-Seidel point-implicit scheme, used to solve the governing equations, is described and shown to be more robust and efficient than a matrix-free alternative. In addition, a y+ adaptive turbulent wall boundary condition methodology is presented. This boundary condition methodology is deigned to automatically switch between a solve-to-the-wall and a wall-matching-function boundary condition based on the local y+ of the 1st cell center off the wall. The aforementioned methods and techniques are then applied to a series of hypersonic and supersonic turbulent flat plate unit tests to examine the efficiency, robustness and convergence behavior of the implicit scheme and to determine the ability of the solve-to-the-wall and y+ adaptive turbulent wall boundary conditions to reproduce the turbulent law-of-the-wall. Finally, the thermally perfect, chemically frozen, Mach 7.8 turbulent flow of air through a scramjet flow-path is computed and compared with experimental data to demonstrate the robustness, accuracy and convergence behavior of the unstructured-grid solver for a realistic 3-D geometry on a non-hex-dominant grid.
Gradient Calculation Methods on Arbitrary Polyhedral Unstructured Meshes for Cell-Centered CFD Solvers

NASA Technical Reports Server (NTRS)

Sozer, Emre; Brehm, Christoph; Kiris, Cetin C.

2014-01-01

A survey of gradient reconstruction methods for cell-centered data on unstructured meshes is conducted within the scope of accuracy assessment. Formal order of accuracy, as well as error magnitudes for each of the studied methods, are evaluated on a complex mesh of various cell types through consecutive local scaling of an analytical test function. The tests highlighted several gradient operator choices that can consistently achieve 1st order accuracy regardless of cell type and shape. The tests further offered error comparisons for given cell types, leading to the observation that the "ideal" gradient operator choice is not universal. Practical implications of the results are explored via CFD solutions of a 2D inviscid standing vortex, portraying the discretization error properties. A relatively naive, yet largely unexplored, approach of local curvilinear stencil transformation exhibited surprisingly favorable properties

Inhomogeneous Radiation Boundary Conditions Simulating Incoming Acoustic Waves for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Tam, Christopher K. W.; Fang, Jun; Kurbatskii, Konstantin A.

1996-01-01

A set of nonhomogeneous radiation and outflow conditions which automatically generate prescribed incoming acoustic or vorticity waves and, at the same time, are transparent to outgoing sound waves produced internally in a finite computation domain is proposed. This type of boundary condition is needed for the numerical solution of many exterior aeroacoustics problems. In computational aeroacoustics, the computation scheme must be as nondispersive ans nondissipative as possible. It must also support waves with wave speeds which are nearly the same as those of the original linearized Euler equations. To meet these requirements, a high-order/large-stencil scheme is necessary The proposed nonhomogeneous radiation and outflow boundary conditions are designed primarily for use in conjunction with such high-order/large-stencil finite difference schemes.
Compact cell-centered discretization stencils at fine-coarse block structured grid interfaces

NASA Astrophysics Data System (ADS)

Pletzer, Alexander; Jamroz, Ben; Crockett, Robert; Sides, Scott

2014-03-01

Different strategies for coupling fine-coarse grid patches are explored in the context of the adaptive mesh refinement (AMR) method. We show that applying linear interpolation to fill in the fine grid ghost values can produce a finite volume stencil of comparable accuracy to quadratic interpolation provided the cell volumes are adjusted. The volume of fine cells expands whereas the volume of neighboring coarse cells contracts. The amount by which the cells contract/expand depends on whether the interface is a face, an edge, or a corner. It is shown that quadratic or better interpolation is required when the conductivity is spatially varying, anisotropic, the refinement ratio is other than two, or when the fine-coarse interface is concave.
STENCIL - Strategies and Tools for Environment-friendly Shore Nourishments as Climate Change Impact Low-Regret Measures

NASA Astrophysics Data System (ADS)

Schimmels, Stefan; Cofalla, Catrina; Deutschmann, Björn; Ganal, Caroline; Gijsman, Rik; Hass, H. Christian; Hollert, Henner; Mielck, Finn; Schlurmann, Thorsten; Schüttrumpf, Holger; Shiravani, Gholamreza; Staudt, Franziska; Strusinska, Agnieszka; Visscher, Jan; Wiltshire, Karen; Wolbring, Johanna

2017-04-01

Shore nourishments are regarded as an almost routine coastal protection measure and have been carried out worldwide for several decades. Recent studies generally conclude that "soft" coastal protection measures are an effective option for a sustainable coastal management. However, more research on economic sustainability, species-specific habitat demands and availability of sand deposits is required. Nowadays, the recent paradigm shifts to concepts like the Integrated Coastal Zone Management (ICZM) and the Ecosystem Approach to Management (EAM). For the German Wadden Sea these management objectives are an important issue of the "Wattenmeerstrategie 2100" (MELLUR-SH, 2015), a political strategy report that demands an adaption against the global change and the expected sea-level rise up to the year 2100. Hence, also new concepts and tools for the implementation of more sustainable, effective and environment-friendly shore nourishments are needed. The research project STENCIL joins the expertise of coastal engineers, geologists, biologists and toxicologists in order to make a first step towards the long-term goal of establishing an ICZM and EAM for shore nourishments in the German Wadden Sea. The project focuses on providing improved tools, models and methods for the prediction of coastal hydro- and morphodynamics. Furthermore, the impact of dredging and dumping activities on benthic habitats and their natural regeneration potentials will be evaluated. Since these impacts are still widely uninvestigated, monitoring of dredging areas and the surrounding sites using hydroacoustic devices, aerial photos and sediment samples for grain-size and benthos analysis remains of high importance. In order to develop standardized operative observation methods, analysis and decision-supporting tools, an implementation of field measurements, laboratory experiments as well as conceptual and numerical models is planned. These combined approaches will result in valuable data sets for habitat evaluation, improved prediction methods as well as process and work-flow studies. Finally, a strategy for future planning and monitoring of shore nourishment projects will be established in close cooperation with the coastal authorities. STENCIL has recently started in October 2016 within the research program "Research for sustainable development" (FONA) of the German Federal Ministry of Education and Research (BMBF). The poster will give an overview of the project structure and present the research objectives and methods in more detail.
Electric and hybrid vehicle site operators program: Thinking of the future

NASA Astrophysics Data System (ADS)

Kansas State University, with support from federal, state, public, and private companies, is participating in the Department of Energy's Electric Vehicle Site Operator Program. Through participation in this program, Kansas State is displaying, testing, and evaluating electric or hybrid vehicle technology. This participation will provide organizations the opportunity to examine the latest EHV prototypes under actual operating conditions. KSU proposes to purchase one electric or hybrid van and two electric cars during the first two years of this five-year program. KSU has purchased one G-Van built by Conceptor Industries, Toronto, Canada and has initiated a procurement order to purchase two Soleq 1993 Ford EVcort station wagons. The G-Van has been signed in order for the public to be aware that this is an electric drive vehicle. Financial participants' names have been stenciled on the back door of the van. This vehicle is available for short term loan to interested utilities and companies. When other vehicles are obtained, the G-Van will be maintained on K-State's campus.
Representation of multiaquifer well effects in three-dimensional ground-water flow simulation

USGS Publications Warehouse

Bennett, Gordon D.; Kontis, Angelo L.; Larson, Steven P.

1982-01-01

The presence of multiaquifer or multilayer wells changes the nature of the equations which must be solved in a three-dimensional ground-water flow simulation and, in effect, alters the stencil of computation. A method has been devised which takes this change into consideration by allowing simulation of the hydraulic effects of a multiaquifer well on the aquifer system. It also allows for calculation of the water level and individual aquifer discharges in such a well. The method is valid for the case of a single well located at the center of a square node block. Where more than one well per node is involved, the effects of the stencil alteration still must be considered, although difficulties arise in estimating and justifying the parameters to be utilized.
27 CFR 26.206 - Marking packages and cases.

Code of Federal Regulations, 2010 CFR

2010-04-01

..., rectifier, or bottler shall serially number each case, barrel, cask, or similar container of distilled... distiller, rectifier, or bottler shall plainly print, stamp, or stencil with durable coloring material, in...
7 CFR 51.1216 - Size requirements.

Code of Federal Regulations, 2013 CFR

2013-01-01

...) The numerical count or a count-size based on equivalent tray pack size designations or the minimum... numerical count is not shown the minimum diameter shall be plainly stamped, stenciled, or otherwise marked...
7 CFR 51.1216 - Size requirements.

Code of Federal Regulations, 2014 CFR

2014-01-01

...) The numerical count or a count-size based on equivalent tray pack size designations or the minimum... numerical count is not shown the minimum diameter shall be plainly stamped, stenciled, or otherwise marked...
7 CFR 51.2927 - Marking and packing requirements.

Code of Federal Regulations, 2012 CFR

2012-01-01

... and packing requirements. The minimum size or numerical count of the apricots in any package shall be plainly labeled, stenciled, or otherwise marked on the package. (a) Numerical count. When the numerical...
7 CFR 51.2927 - Marking and packing requirements.

Code of Federal Regulations, 2010 CFR

2010-01-01

... and packing requirements. The minimum size or numerical count of the apricots in any package shall be plainly labeled, stenciled, or otherwise marked on the package. (a) Numerical count. When the numerical...
7 CFR 51.2927 - Marking and packing requirements.

Code of Federal Regulations, 2011 CFR

2011-01-01

... and packing requirements. The minimum size or numerical count of the apricots in any package shall be plainly labeled, stenciled, or otherwise marked on the package. (a) Numerical count. When the numerical...
3D frequency-domain finite-difference modeling of acoustic wave propagation

NASA Astrophysics Data System (ADS)

Operto, S.; Virieux, J.

2006-12-01

We present a 3D frequency-domain finite-difference method for acoustic wave propagation modeling. This method is developed as a tool to perform 3D frequency-domain full-waveform inversion of wide-angle seismic data. For wide-angle data, frequency-domain full-waveform inversion can be applied only to few discrete frequencies to develop reliable velocity model. Frequency-domain finite-difference (FD) modeling of wave propagation requires resolution of a huge sparse system of linear equations. If this system can be solved with a direct method, solutions for multiple sources can be computed efficiently once the underlying matrix has been factorized. The drawback of the direct method is the memory requirement resulting from the fill-in of the matrix during factorization. We assess in this study whether representative problems can be addressed in 3D geometry with such approach. We start from the velocity-stress formulation of the 3D acoustic wave equation. The spatial derivatives are discretized with second-order accurate staggered-grid stencil on different coordinate systems such that the axis span over as many directions as possible. Once the discrete equations were developed on each coordinate system, the particle velocity fields are eliminated from the first-order hyperbolic system (following the so-called parsimonious staggered-grid method) leading to second-order elliptic wave equations in pressure. The second-order wave equations discretized on each coordinate system are combined linearly to mitigate the numerical anisotropy. Secondly, grid dispersion is minimized by replacing the mass term at the collocation point by its weighted averaging over all the grid points of the stencil. Use of second-order accurate staggered- grid stencil allows to reduce the bandwidth of the matrix to be factorized. The final stencil incorporates 27 points. Absorbing conditions are PML. The system is solved using the parallel direct solver MUMPS developed for distributed-memory computers. The MUMPS solver is based on a multifrontal method for LU factorization. We used the METIS algorithm to perform re-ordering of the matrix coefficients before factorization. Four grid points per minimum wavelength is used for discretization. We applied our algorithm to the 3D SEG/EAGE synthetic onshore OVERTHRUST model of dimensions 20 x 20 x 4.65 km. The velocities range between 2 and 6 km/s. We performed the simulations using 192 processors with 2 Gbytes of RAM memory per processor. We performed simulations for the 5 Hz, 7 Hz and 10 Hz frequencies in some fractions of the OVERTHRUST model. The grid interval was 100 m, 75 m and 50 m respectively. The grid dimensions were 207x207x53, 275x218x71 and 409x109x102 respectively corresponding to 100, 80 and 25 percents of the model respectively. The time for factorization is 20 mn, 108 mn and 163 mn respectively. The time for resolution was 3.8, 9.3 and 10.3 s per source. The total memory used during factorization is 143, 384 and 449 Gbytes respectively. One can note the huge memory requirement for factorization and the efficiency of the direct method to compute solutions for a large number of sources. This highlights the respective drawback and merit of the frequency-domain approach with respect to the time- domain counterpart. These results show that 3D acoustic frequency-domain wave propagation modeling can be performed at low frequencies using direct solver on large clusters of Pcs. This forward modeling algorithm may be used in the future as a tool to image the first kilometers of the crust by frequency-domain full-waveform inversion. For larger problems, we will use the out-of-core memory during factorization that has been implemented by the authors of MUMPS.
Detailed analysis of the effects of stencil spatial variations with arbitrary high-order finite-difference Maxwell solver

DOE PAGES

Vincenti, H.; Vay, J. -L.

2015-11-22

Due to discretization effects and truncation to finite domains, many electromagnetic simulations present non-physical modifications of Maxwell's equations in space that may generate spurious signals affecting the overall accuracy of the result. Such modifications for instance occur when Perfectly Matched Layers (PMLs) are used at simulation domain boundaries to simulate open media. Another example is the use of arbitrary order Maxwell solver with domain decomposition technique that may under some condition involve stencil truncations at subdomain boundaries, resulting in small spurious errors that do eventually build up. In each case, a careful evaluation of the characteristics and magnitude of themore » errors resulting from these approximations, and their impact at any frequency and angle, requires detailed analytical and numerical studies. To this end, we present a general analytical approach that enables the evaluation of numerical discretization errors of fully three-dimensional arbitrary order finite-difference Maxwell solver, with arbitrary modification of the local stencil in the simulation domain. The analytical model is validated against simulations of domain decomposition technique and PMLs, when these are used with very high-order Maxwell solver, as well as in the infinite order limit of pseudo-spectral solvers. Results confirm that the new analytical approach enables exact predictions in each case. It also confirms that the domain decomposition technique can be used with very high-order Maxwell solver and a reasonably low number of guard cells with negligible effects on the whole accuracy of the simulation.« less
Prediction of the moments in advection-diffusion lattice Boltzmann method. II. Attenuation of the boundary layers via double-Λ bounce-back flux scheme.

PubMed

Ginzburg, Irina

2017-01-01

Impact of the unphysical tangential advective-diffusion constraint of the bounce-back (BB) reflection on the impermeable solid surface is examined for the first four moments of concentration. Despite the number of recent improvements for the Neumann condition in the lattice Boltzmann method-advection-diffusion equation, the BB rule remains the only known local mass-conserving no-flux condition suitable for staircase porous geometry. We examine the closure relation of the BB rule in straight channel and cylindrical capillary analytically, and show that it excites the Knudsen-type boundary layers in the nonequilibrium solution for full-weight equilibrium stencil. Although the d2Q5 and d3Q7 coordinate schemes are sufficient for the modeling of isotropic diffusion, the full-weight stencils are appealing for their advanced stability, isotropy, anisotropy and anti-numerical-diffusion ability. The boundary layers are not covered by the Chapman-Enskog expansion around the expected equilibrium, but they accommodate the Chapman-Enskog expansion in the bulk with the closure relation of the bounce-back rule. We show that the induced boundary layers introduce first-order errors in two primary transport properties, namely, mean velocity (first moment) and molecular diffusion coefficient (second moment). As a side effect, the Taylor-dispersion coefficient (second moment), skewness (third moment), and kurtosis (fourth moment) deviate from their physical values and predictions of the fourth-order Chapman-Enskog analysis, even though the kurtosis error in pure diffusion does not depend on grid resolution. In two- and three-dimensional grid-aligned channels and open-tubular conduits, the errors of velocity and diffusion are proportional to the diagonal weight values of the corresponding equilibrium terms. The d2Q5 and d3Q7 schemes do not suffer from this deficiency in grid-aligned geometries but they cannot avoid it if the boundaries are not parallel to the coordinate lines. In order to vanish or attenuate the disparity of the modeled transport coefficients with the equilibrium weights without any modification of the BB rule, we propose to use the two-relaxation-times collision operator with free-tunable product of two eigenfunctions Λ. Two different values Λ_{v} and Λ_{b} are assigned for bulk and boundary nodes, respectively. The rationale behind this is that Λ_{v} is adjustable for stability, accuracy, or other purposes, while the corresponding Λ_{b}(Λ_{v}) controls the primary accommodation effects. Two distinguished but similar functional relations Λ_{b}(Λ_{v}) are constructed analytically: they preserve advection velocity in parabolic profile, exactly in the two-dimensional channel and very accurately in a three-dimensional cylindrical capillary. For any velocity-weight stencil, the (local) double-Λ BB scheme produces quasi-identical solutions with the (nonlocal) specular-forward reflection for first four moments in a channel. In a capillary, this strategy allows for the accurate modeling of the Taylor-dispersion and non-Gaussian effects. As illustrative example, it is shown that in the flow around a circular obstacle, the double-Λ scheme may also vanish the dependency of mean velocity on the velocity weight; the required value for Λ_{b}(Λ_{v}) can be identified in a few bisection iterations in given geometry. A positive solution for Λ_{b}(Λ_{v}) may not exist in pure diffusion, but a sufficiently small value of Λ_{b} significantly reduces the disparity in diffusion coefficient with the mass weight in ducts and in the presence of rectangular obstacles. Although Λ_{b} also controls the effective position of straight or curved boundaries, the double-Λ scheme deals with the lower-order effects. Its idea and construction may help understanding and amelioration of the anomalous, zero- and first-order behavior of the macroscopic solution in the presence of the bulk and boundary or interface discontinuities, commonly found in multiphase flow and heterogeneous transport.
Prediction of the moments in advection-diffusion lattice Boltzmann method. II. Attenuation of the boundary layers via double-Λ bounce-back flux scheme

NASA Astrophysics Data System (ADS)

Ginzburg, Irina

2017-01-01

Impact of the unphysical tangential advective-diffusion constraint of the bounce-back (BB) reflection on the impermeable solid surface is examined for the first four moments of concentration. Despite the number of recent improvements for the Neumann condition in the lattice Boltzmann method-advection-diffusion equation, the BB rule remains the only known local mass-conserving no-flux condition suitable for staircase porous geometry. We examine the closure relation of the BB rule in straight channel and cylindrical capillary analytically, and show that it excites the Knudsen-type boundary layers in the nonequilibrium solution for full-weight equilibrium stencil. Although the d2Q5 and d3Q7 coordinate schemes are sufficient for the modeling of isotropic diffusion, the full-weight stencils are appealing for their advanced stability, isotropy, anisotropy and anti-numerical-diffusion ability. The boundary layers are not covered by the Chapman-Enskog expansion around the expected equilibrium, but they accommodate the Chapman-Enskog expansion in the bulk with the closure relation of the bounce-back rule. We show that the induced boundary layers introduce first-order errors in two primary transport properties, namely, mean velocity (first moment) and molecular diffusion coefficient (second moment). As a side effect, the Taylor-dispersion coefficient (second moment), skewness (third moment), and kurtosis (fourth moment) deviate from their physical values and predictions of the fourth-order Chapman-Enskog analysis, even though the kurtosis error in pure diffusion does not depend on grid resolution. In two- and three-dimensional grid-aligned channels and open-tubular conduits, the errors of velocity and diffusion are proportional to the diagonal weight values of the corresponding equilibrium terms. The d2Q5 and d3Q7 schemes do not suffer from this deficiency in grid-aligned geometries but they cannot avoid it if the boundaries are not parallel to the coordinate lines. In order to vanish or attenuate the disparity of the modeled transport coefficients with the equilibrium weights without any modification of the BB rule, we propose to use the two-relaxation-times collision operator with free-tunable product of two eigenfunctions Λ . Two different values Λv and Λb are assigned for bulk and boundary nodes, respectively. The rationale behind this is that Λv is adjustable for stability, accuracy, or other purposes, while the corresponding Λb(Λv) controls the primary accommodation effects. Two distinguished but similar functional relations Λb(Λv) are constructed analytically: they preserve advection velocity in parabolic profile, exactly in the two-dimensional channel and very accurately in a three-dimensional cylindrical capillary. For any velocity-weight stencil, the (local) double-Λ BB scheme produces quasi-identical solutions with the (nonlocal) specular-forward reflection for first four moments in a channel. In a capillary, this strategy allows for the accurate modeling of the Taylor-dispersion and non-Gaussian effects. As illustrative example, it is shown that in the flow around a circular obstacle, the double-Λ scheme may also vanish the dependency of mean velocity on the velocity weight; the required value for Λb(Λv) can be identified in a few bisection iterations in given geometry. A positive solution for Λb(Λv) may not exist in pure diffusion, but a sufficiently small value of Λb significantly reduces the disparity in diffusion coefficient with the mass weight in ducts and in the presence of rectangular obstacles. Although Λb also controls the effective position of straight or curved boundaries, the double-Λ scheme deals with the lower-order effects. Its idea and construction may help understanding and amelioration of the anomalous, zero- and first-order behavior of the macroscopic solution in the presence of the bulk and boundary or interface discontinuities, commonly found in multiphase flow and heterogeneous transport.
A parallel algorithm for viewshed analysis in three-dimensional Digital Earth

NASA Astrophysics Data System (ADS)

Feng, Wang; Gang, Wang; Deji, Pan; Yuan, Liu; Liuzhong, Yang; Hongbo, Wang

2015-02-01

Viewshed analysis, often supported by geographic information systems, is widely used in the three-dimensional (3D) Digital Earth system. Many of the analyzes involve the siting of features and real-timedecision-making. Viewshed analysis is usually performed at a large scale, which poses substantial computational challenges, as geographic datasets continue to become increasingly large. Previous research on viewshed analysis has been generally limited to a single data structure (i.e., DEM), which cannot be used to analyze viewsheds in complicated scenes. In this paper, a real-time algorithm for viewshed analysis in Digital Earth is presented using the parallel computing of graphics processing units (GPUs). An occlusion for each geometric entity in the neighbor space of the viewshed point is generated according to line-of-sight. The region within the occlusion is marked by a stencil buffer within the programmable 3D visualization pipeline. The marked region is drawn with red color concurrently. In contrast to traditional algorithms based on line-of-sight, the new algorithm, in which the viewshed calculation is integrated with the rendering module, is more efficient and stable. This proposed method of viewshed generation is closer to the reality of the virtual geographic environment. No DEM interpolation, which is seen as a computational burden, is needed. The algorithm was implemented in a 3D Digital Earth system (GeoBeans3D) with the DirectX application programming interface (API) and has been widely used in a range of applications.
View looking straightup at celing in center of entrance portico ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

View looking straight-up at celing in center of entrance portico showing stenciled and painted panel saying "C.R.R. 1856." - Central of Georgia Railway, Gray Building, 227 West Broad Street, Savannah, Chatham County, GA
Advances in Distance-Based Hole Cuts on Overset Grids

NASA Technical Reports Server (NTRS)

Chan, William M.; Pandya, Shishir A.

2015-01-01

An automatic and efficient method to determine appropriate hole cuts based on distances to the wall and donor stencil maps for overset grids is presented. A new robust procedure is developed to create a closed surface triangulation representation of each geometric component for accurate determination of the minimum hole. Hole boundaries are then displaced away from the tight grid-spacing regions near solid walls to allow grid overlap to occur away from the walls where cell sizes from neighboring grids are more comparable. The placement of hole boundaries is efficiently determined using a mid-distance rule and Cartesian maps of potential valid donor stencils with minimal user input. Application of this procedure typically results in a spatially-variable offset of the hole boundaries from the minimum hole with only a small number of orphan points remaining. Test cases on complex configurations are presented to demonstrate the new scheme.
A Practical Approach To Lift-Off

NASA Astrophysics Data System (ADS)

Jones, Susan K.; Chapman, Richard C.; Pavelchek, Edward K.

1987-08-01

Lift-off technology provides an alternate metal patterning technology to that of subtractive etching. In this raper, we describe an image reversal process which provides a practical means for reliably producing resist stencils which are required for successful lift-off in a 2.0 μm metal pitch CMOS process, as well as for experimental submicron processing. Experimental data and PROSIM simulations are presented to show the effects of patterning exposure dose, flood exposure dose, develop time, and focus parameters on resist linewidths as well as for control of resist retrograde (undercut) sidewall angles. Deposition and subsequent lift-off of Al/Cu alloys and sandwich metallizations is demonstrated. Because the image reversal process enables pattern definition at the top of the resist film, it is demonstrated that thicker resist films can be used to produce finer resolution of lift-off stencils over topography than would have been expected without resorting to multilayer resist structures.
Stencil Nano Lithography Based on a Nanoscale Polymer Shadow Mask: Towards Organic Nanoelectronics

PubMed Central

Yun, Hoyeol; Kim, Sangwook; Kim, Hakseong; Lee, Junghyun; McAllister, Kirstie; Kim, Junhyung; Pyo, Sengmoon; Sung Kim, Jun; Campbell, Eleanor E. B.; Hyoung Lee, Wi; Wook Lee, Sang

2015-01-01

A stencil lithography technique has been developed to fabricate organic-material-based electronic devices with sub-micron resolution. Suspended polymethylmethacrylate (PMMA) membranes were used as shadow masks for defining organic channels and top electrodes. Arrays of pentacene field effect transistors (FETs) with various channel lengths from 50 μm down to 500 nm were successfully produced from the same batch using this technique. Electrical transport measurements showed that the electrical contacts of all devices were stable and the normalized contact resistances were much lower than previously studied organic FETs. Scaling effects, originating from the bulk space charge current, were investigated by analyzing the channel-length-dependent mobility and hysteresis behaviors. This novel lithography method provides a reliable means for studying the fundamental transport properties of organic materials at the nanoscale as well as enabling potential applications requiring the fabrication of integrated organic nanoelectronic devices. PMID:25959389

Stencil nano lithography based on a nanoscale polymer shadow mask: towards organic nanoelectronics.

PubMed

Yun, Hoyeol; Kim, Sangwook; Kim, Hakseong; Lee, Junghyun; McAllister, Kirstie; Kim, Junhyung; Pyo, Sengmoon; Sung Kim, Jun; Campbell, Eleanor E B; Hyoung Lee, Wi; Wook Lee, Sang

2015-05-11

A stencil lithography technique has been developed to fabricate organic-material-based electronic devices with sub-micron resolution. Suspended polymethylmethacrylate (PMMA) membranes were used as shadow masks for defining organic channels and top electrodes. Arrays of pentacene field effect transistors (FETs) with various channel lengths from 50 μm down to 500 nm were successfully produced from the same batch using this technique. Electrical transport measurements showed that the electrical contacts of all devices were stable and the normalized contact resistances were much lower than previously studied organic FETs. Scaling effects, originating from the bulk space charge current, were investigated by analyzing the channel-length-dependent mobility and hysteresis behaviors. This novel lithography method provides a reliable means for studying the fundamental transport properties of organic materials at the nanoscale as well as enabling potential applications requiring the fabrication of integrated organic nanoelectronic devices.
Self-assembly of organic monolayers as protective and conductive bridges for nanometric surface-mount applications.

PubMed

Platzman, Ilia; Haick, Hossam; Tannenbaum, Rina

2010-09-01

In this work, we present a novel surface-mount placement process that could potentially overcome the inadequacies of the currently used stencil-printing technology, when applied to devices in which either their lateral and/or their horizontal dimensions approach the nanometric scale. Our novel process is based on the "bottom-up" design of an adhesive layer, operative in the molecular/nanoscale level, through the use of self-assembled monolayers (SAMs) that could form protective and conductive bridges between pads and components. On the basis of previous results, 1,4-phenylene diisocyanide (PDI) and terephthalic acid (TPA) were chosen to serve as the best candidates for the achievement of this goal. The quality and stability of these SAMs on annealed Cu surfaces (Rrms=0.15-1.1 nm) were examined in detail. Measurements showed that the SAMs of TPA and PDI molecules formed on top of Cu substrates created thermally stable organic monolayers with high surface coverage (∼90%), in which the molecules were closely packed and well-ordered. Moreover, the molecules assumed a standing-up phase conformation, in which the molecules bonded to the Cu substrate through one terminal functional group, with the other terminal group residing away from the substrate. To examine the ability of these monolayers to serve as "molecular wires," i.e., the capability to provide electrical conductivity, we developed a novel fabrication method of a parallel plate junction (PPJ) in order to create symmetric Cu-SAM-Cu electrical junctions. The current-bias measurements of these junctions indicated high tunneling efficiency. These achievements imply that the SAMs used in this study can serve as conductive molecular bridges that can potentially bind circuital pads/components.
Perfect Color Registration Realized.

ERIC Educational Resources Information Center

Lovedahl, Gerald G.

1979-01-01

Describes apparatus and procedures to design and construct a "printing box" as a graphic arts project to make color prints on T-shirts using photography, indirect and direct photo screen methods, and other types of stencils. Step-by-step photographs illustrate the process. (MF)
AQUEOUS CLEANING OF PRINTED CIRCUIT BOARD STENCILS

EPA Science Inventory

The USEPA through NRMRL has partnered with the California Dept. of Toxic Substance Control under an ETV Pilot Project to verigy polllution prevention, recycling and waste treatment technologies. One of the projects selected for verification was the ultrasonic aqueous cleaning tec...
Apparatus for Teaching Physics.

ERIC Educational Resources Information Center

Gottlieb, Herbert H., Ed.

1981-01-01

Describes an apparatus for plotting electric fields using burglar alarm window tape for electrodes and carbonized electronic stencil paper as sheet resistance. Also describes a simple pentode modulator circuit which will modulate a typical helium-neon gas laser, providing an audio channel for demonstration purposes. (SK)
Synthetic Proxy Infrastructure for Task Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Junghans, Christoph; Pavel, Robert

The Synthetic Proxy Infrastructure for Task Evaluation is a proxy application designed to support application developers in gauging the performance of various task granularities when determining how best to utilize task based programming models.The infrastructure is designed to provide examples of common communication patterns with a synthetic workload intended to provide performance data to evaluate programming model and platform overheads for the purpose of determining task granularity for task decomposition purposes. This is presented as a reference implementation of a proxy application with run-time configurable input and output task dependencies ranging from an embarrassingly parallel scenario to patterns with stencil-likemore » dependencies upon their nearest neighbors. Once all, if any, inputs are satisfied each task will execute a synthetic workload (a simple DGEMM of in this case) of varying size and output all, if any, outputs to the next tasks.The intent is for this reference implementation to be implemented as a proxy app in different programming models so as to provide the same infrastructure and to allow for application developers to simulate their own communication needs to assist in task decomposition under various models on a given platform.« less
75 FR 34204 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2010-06-16

... compliance from certain provisions of 49 CFR part 215, Railroad Freight Car Safety Standards, specifically 49 CFR 215.303 (Stenciling of restricted cars), which requires that restricted railroad freight cars... requirements of its safety standards. The individual petition is described below, including the party seeking...
Cost-effective accurate coarse-grid method for highly convective multidimensional unsteady flows

NASA Technical Reports Server (NTRS)

Leonard, B. P.; Niknafs, H. S.

1991-01-01

A fundamentally multidimensional convection scheme is described based on vector transient interpolation modeling rewritten in conservative control-volume form. Vector third-order upwinding is used as the basis of the algorithm; this automatically introduces important cross-difference terms that are absent from schemes using component-wise one-dimensional formulas. Third-order phase accuracy is good; this is important for coarse-grid large-eddy or full simulation. Potential overshoots or undershoots are avoided by using a recently developed universal limiter. Higher order accuracy is obtained locally, where needed, by the cost-effective strategy of adaptive stencil expansion in a direction normal to each control-volume face; this is controlled by monitoring the absolute normal gradient and curvature across the face. Higher (than third) order cross-terms do not appear to be needed. Since the wider stencil is used only in isolated narrow regions (near discontinuities), extremely high (in this case, seventh) order accuracy can be achieved for little more than the cost of a globally third-order scheme.
A cost-effective strategy for nonoscillatory convection without clipping

NASA Technical Reports Server (NTRS)

Leonard, B. P.; Niknafs, H. S.

1990-01-01

Clipping of narrow extrema and distortion of smooth profiles is a well known problem associated with so-called high resolution nonoscillatory convection schemes. A strategy is presented for accurately simulating highly convective flows containing discontinuities such as density fronts or shock waves, without distorting smooth profiles or clipping narrow local extrema. The convection algorithm is based on non-artificially diffusive third-order upwinding in smooth regions, with automatic adaptive stencil expansion to (in principle, arbitrarily) higher order upwinding locally, in regions of rapidly changing gradients. This is highly cost effective because the wider stencil is used only where needed-in isolated narrow regions. A recently developed universal limiter assures sharp monotonic resolution of discontinuities without introducing artificial diffusion or numerical compression. An adaptive discriminator is constructed to distinguish between spurious overshoots and physical peaks; this automatically relaxes the limiter near local turning points, thereby avoiding loss of resolution in narrow extrema. Examples are given for one-dimensional pure convection of scalar profiles at constant velocity.
Automated Approach to Very High-Order Aeroacoustic Computations. Revision

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Goodrich, John W.

2001-01-01

Computational aeroacoustics requires efficient, high-resolution simulation tools. For smooth problems, this is best accomplished with very high-order in space and time methods on small stencils. However, the complexity of highly accurate numerical methods can inhibit their practical application, especially in irregular geometries. This complexity is reduced by using a special form of Hermite divided-difference spatial interpolation on Cartesian grids, and a Cauchy-Kowalewski recursion procedure for time advancement. In addition, a stencil constraint tree reduces the complexity of interpolating grid points that am located near wall boundaries. These procedures are used to develop automatically and to implement very high-order methods (> 15) for solving the linearized Euler equations that can achieve less than one grid point per wavelength resolution away from boundaries by including spatial derivatives of the primitive variables at each grid point. The accuracy of stable surface treatments is currently limited to 11th order for grid aligned boundaries and to 2nd order for irregular boundaries.
Suppressing Ionic Terms with Number-Counting Jastrow Factors in Real Space

DOE PAGES

Goetz, Brett Van Der; Neuscamman, Eric

2017-04-06

Here, we demonstrate that four-body real-space Jastrow factors are, with the right type of Jastrow basis function, capable of performing successful wave function stenciling to remove unwanted ionic terms from an overabundant Fermionic reference without unduly modifying the remaining components. In addition to greatly improving size consistency (restoring it exactly in the case of a geminal power), real-space wave function stenciling is, unlike its Hilbert-space predecessors, immediately compatible with diffusion Monte Carlo, allowing it to be used in the pursuit of compact, strongly correlated trial functions with reliable nodal surfaces. Furthermore, we demonstrate the efficacy of this approach in themore » context of a double bond dissociation by using it to extract a qualitatively correct nodal surface despite being paired with a restricted Slater determinant, that, due to ionic term errors, produces a ground state with a qualitatively incorrect nodal surface when used in the absence of the Jastrow.« less
Suppressing Ionic Terms with Number-Counting Jastrow Factors in Real Space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goetz, Brett Van Der; Neuscamman, Eric

Here, we demonstrate that four-body real-space Jastrow factors are, with the right type of Jastrow basis function, capable of performing successful wave function stenciling to remove unwanted ionic terms from an overabundant Fermionic reference without unduly modifying the remaining components. In addition to greatly improving size consistency (restoring it exactly in the case of a geminal power), real-space wave function stenciling is, unlike its Hilbert-space predecessors, immediately compatible with diffusion Monte Carlo, allowing it to be used in the pursuit of compact, strongly correlated trial functions with reliable nodal surfaces. Furthermore, we demonstrate the efficacy of this approach in themore » context of a double bond dissociation by using it to extract a qualitatively correct nodal surface despite being paired with a restricted Slater determinant, that, due to ionic term errors, produces a ground state with a qualitatively incorrect nodal surface when used in the absence of the Jastrow.« less
An Automated Approach to Very High Order Aeroacoustic Computations in Complex Geometries

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Goodrich, John W.

2000-01-01

Computational aeroacoustics requires efficient, high-resolution simulation tools. And for smooth problems, this is best accomplished with very high order in space and time methods on small stencils. But the complexity of highly accurate numerical methods can inhibit their practical application, especially in irregular geometries. This complexity is reduced by using a special form of Hermite divided-difference spatial interpolation on Cartesian grids, and a Cauchy-Kowalewslci recursion procedure for time advancement. In addition, a stencil constraint tree reduces the complexity of interpolating grid points that are located near wall boundaries. These procedures are used to automatically develop and implement very high order methods (>15) for solving the linearized Euler equations that can achieve less than one grid point per wavelength resolution away from boundaries by including spatial derivatives of the primitive variables at each grid point. The accuracy of stable surface treatments is currently limited to 11th order for grid aligned boundaries and to 2nd order for irregular boundaries.
Stencil lithography of superconducting contacts on MBE-grown topological insulator thin films

NASA Astrophysics Data System (ADS)

Schüffelgen, Peter; Rosenbach, Daniel; Neumann, Elmar; Stehno, Martin P.; Lanius, Martin; Zhao, Jialin; Wang, Meng; Sheehan, Brendan; Schmidt, Michael; Gao, Bo; Brinkman, Alexander; Mussler, Gregor; Schäpers, Thomas; Grützmacher, Detlev

2017-11-01

Topological insulator (Bi0.06Sb0.94)2Te3 thin films grown by molecular beam epitaxy have been capped in-situ with a 2 nm Al film to conserve the pristine topological surface states. Subsequently, a shadow mask - structured by means of focus ion beam - was in-situ placed underneath the sample to deposit a thick layer of Al on well-defined microscopically small areas. The 2 nm thin Al layer fully oxidizes after exposure to air and in this way protects the TI surface from degradation. The thick Al layer remains metallic underneath a 3-4 nm thick native oxide layer and therefore serves as (super-) conducting contacts. Superconductor-Topological Insulator-Superconductor junctions with lateral dimensions in the nm range have then been fabricated via an alternative stencil lithography technique. Despite the in-situ deposition, transport measurements and transmission electron microscope analysis indicate a low transparency, due to an intermixed region at the interface between topological insulator thin film and metallic Al.
Applying the miniaturization technologies for biosensor design.

PubMed

Derkus, Burak

2016-05-15

Microengineering technologies give us some opportunities in developing high-tech sensing systems that operate with low volumes of samples, integrates one or more laboratory functions on a single substrate, and enables automation. These millimetric sized devices can be produced for only a few dollars, which makes them promising candidates for mass-production. Besides electron beam lithography, stencil lithography, nano-imprint lithography or dip pen lithography, basic photolithography is the technique which is extensively used for the design of microengineered sensing systems. This technique has some advantages such as easy-to-manufacture, do not require expensive instrumentation, and allow creation of lower micron-sized patterns. In this review, it has been focused on three different type of microengineered sensing devices which are developed using micro/nano-patterning techniques, microfluidic technology, and microelectromechanics system based technology. Copyright © 2016 Elsevier B.V. All rights reserved.
Hybrid thermal link-wise artificial compressibility method

NASA Astrophysics Data System (ADS)

Obrecht, Christian; Kuznik, Frédéric

2015-10-01

Thermal flow prediction is a subject of interest from a scientific and engineering points of view. Our motivation is to develop an accurate, easy to implement and highly scalable method for convective flows simulation. To this end, we present an extension to the link-wise artificial compressibility method (LW-ACM) for thermal simulation of weakly compressible flows. The novel hybrid formulation uses second-order finite difference operators of the energy equation based on the same stencils as the LW-ACM. For validation purposes, the differentially heated cubic cavity was simulated. The simulations remained stable for Rayleigh numbers up to Ra =108. The Nusselt numbers at isothermal walls and dynamics quantities are in good agreement with reference values from the literature. Our results show that the hybrid thermal LW-ACM is an effective and easy-to-use solution to solve convective flows.
78 FR 1930 - Proposed Agency Information Collection Activities; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-01-09

... mandated by Federal regulations. In summary, FRA reasons that comments received will advance three... burden hours 229.47: Emergency Brake Valve-- 27 railroads....... 30 stencillings.... 1 minute 1.......... 1,000 tags/cards... 3 minutes 50 Equip. w/power brake defects: Limitations on movement found during...
77 FR 33266 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-05

... letter an Exhibit A, which lists the subject cars' types, reporting marks, construction, designs, type... have originally purchased that type of car. HVRR further stated that stenciling the cars and adding... petition, as well as any written communications concerning the petition, is available for review online at...
78 FR 54951 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-06

... waiver of compliance from certain provisions of the Federal railroad safety regulations contained at 49 CFR Part 215-Railroad Freight Car Safety Standards. FRA assigned the petition Docket Number FRA-2013- 0065. TEVR seeks relief from 49 CFR 215.303-Stenciling of restricted cars, which requires that...
Greening up Auto Part Manufacturing: A Collaboration between Academia and Industry

ERIC Educational Resources Information Center

Kneas, Kristi A.; Armstrong, Drew L.; Brank, Alice R.; Johnson, Amanda L.; Kissinger, Chelsea A.; Mabe, Adam R.; Sezer, Ozge; Fontinell, Mike

2009-01-01

Historically, manufacture of automotive electronic components and screen-printing of automotive instrument clusters at DENSO Manufacturing Tennessee, Inc. required washing of equipment such as screens, stencils, and jigs with sizable quantities of volatile organic compounds and hazardous air pollutants. Collaborative efforts between the Maryville…

Splashy Portfolios Kids Can Make Themselves.

ERIC Educational Resources Information Center

Booth, Virginia Humphreys

1994-01-01

A children's art project lets students create artistic portfolios in a Jackson Pollock style. The activity takes 60 minutes and requires posterboard, adhesive tape, spray paint, tempera paints, eyedroppers, newspapers, colored markers, and stencil letters. By inviting students to make their own portfolios, teachers are cultivating students'…
[Electric and hybrid vehicle site operators program]: Thinking of the future. Second year third quarter report, January 1--March 31, 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

Kansas State University, with funding support from federal, state, public, and private companies, is participating in the Department of Energy`s Electric Vehicle Site Operator Program. Through participation in this program, Kansas State is displaying, testing, and evaluating electric or hybrid vehicle technology. This participation will provide organizations the opportunity to examine the latest EHV prototypes under actual operating conditions. KSU proposes to purchase one (1) electric or hybrid vans and two (2) electric cars during the first two years of this five-year program. KSU has purchased one G-Van built by Conceptor Industries, Toronto, Canada and has initiated a procurement ordermore » to purchase two (2) Soleq 1993 Ford EVcort station wagons. The G-Van has been signed in order for the public to be aware that this is an electric drive vehicle. Financial participants` names have been stenciled on the back door of the van. This vehicle is available for short term loan to interested utilities and companies. When other vehicles are obtained, the G-Van will be maintained on K-State`s campus.« less
Nonuniform grid implicit spatial finite difference method for acoustic wave modeling in tilted transversely isotropic media

NASA Astrophysics Data System (ADS)

Chu, Chunlei; Stoffa, Paul L.

2012-01-01

Discrete earth models are commonly represented by uniform structured grids. In order to ensure accurate numerical description of all wave components propagating through these uniform grids, the grid size must be determined by the slowest velocity of the entire model. Consequently, high velocity areas are always oversampled, which inevitably increases the computational cost. A practical solution to this problem is to use nonuniform grids. We propose a nonuniform grid implicit spatial finite difference method which utilizes nonuniform grids to obtain high efficiency and relies on implicit operators to achieve high accuracy. We present a simple way of deriving implicit finite difference operators of arbitrary stencil widths on general nonuniform grids for the first and second derivatives and, as a demonstration example, apply these operators to the pseudo-acoustic wave equation in tilted transversely isotropic (TTI) media. We propose an efficient gridding algorithm that can be used to convert uniformly sampled models onto vertically nonuniform grids. We use a 2D TTI salt model to demonstrate its effectiveness and show that the nonuniform grid implicit spatial finite difference method can produce highly accurate seismic modeling results with enhanced efficiency, compared to uniform grid explicit finite difference implementations.
An Art of Resistance: From the Street to the Classroom

ERIC Educational Resources Information Center

Chung, Sheng Kuan

2009-01-01

Rooted in graffiti culture and its attitude toward the world, street art is regarded as a postgraffiti movement. Street art encompasses a wide array of media and techniques, such as traditional spray-painted tags, stickers, stencils, posters, photocopies, murals, paper cutouts, mosaics, street installations, performances, and video projections…
75 FR 34203 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2010-06-16

... provisions of the Railroad Freight Car Safety Standards, 49 CFR 215.303, which requires stenciling of restricted cars. ITMZ owns four cabooses. They are car numbers: Monon 81528, C & O 90876, NKP 405, and W & LR... requirements of its safety standards. The individual petition is described below, including the party seeking...
75 FR 61561 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-05

... provisions of the Railroad Freight Car Safety Standards, 49 CFR 215.303, which requires stenciling of restricted cars according to Sec. 215.203. SERA owns one gondola and four box cars modified as ``open air... requirements of its safety standards. The individual petition is described below, including the party seeking...
7 CFR 29.32 - Identification number.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 7 Agriculture 2 2010-01-01 2010-01-01 false Identification number. 29.32 Section 29.32 Agriculture... INSPECTION Regulations Definitions § 29.32 Identification number. A number or a combination of letters and numbers in a design or mark approved by the Director, stamped, printed, or stenciled on a lot of tobacco...
7 CFR 29.32 - Identification number.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 7 Agriculture 2 2013-01-01 2013-01-01 false Identification number. 29.32 Section 29.32 Agriculture... INSPECTION Regulations Definitions § 29.32 Identification number. A number or a combination of letters and numbers in a design or mark approved by the Director, stamped, printed, or stenciled on a lot of tobacco...
7 CFR 29.32 - Identification number.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 7 Agriculture 2 2014-01-01 2014-01-01 false Identification number. 29.32 Section 29.32 Agriculture... INSPECTION Regulations Definitions § 29.32 Identification number. A number or a combination of letters and numbers in a design or mark approved by the Director, stamped, printed, or stenciled on a lot of tobacco...
7 CFR 29.32 - Identification number.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 7 Agriculture 2 2011-01-01 2011-01-01 false Identification number. 29.32 Section 29.32 Agriculture... INSPECTION Regulations Definitions § 29.32 Identification number. A number or a combination of letters and numbers in a design or mark approved by the Director, stamped, printed, or stenciled on a lot of tobacco...
7 CFR 29.32 - Identification number.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 7 Agriculture 2 2012-01-01 2012-01-01 false Identification number. 29.32 Section 29.32 Agriculture... INSPECTION Regulations Definitions § 29.32 Identification number. A number or a combination of letters and numbers in a design or mark approved by the Director, stamped, printed, or stenciled on a lot of tobacco...
48 CFR 252.247-7017 - Erroneous shipments.

Code of Federal Regulations, 2010 CFR

2010-10-01

... clause: Erroneous Shipments (DEC 1991) (a) The Contractor shall— (1) Forward to the rightful owner... incorrect stenciling by the Contractor, the Contractor shall forward it to its rightful owner. (3) Deliver... Contractor shall forward to the owner any pieces of one lot not included in delivery, and remaining at its...
48 CFR 252.247-7017 - Erroneous shipments.

Code of Federal Regulations, 2011 CFR

2011-10-01

... clause: Erroneous Shipments (DEC 1991) (a) The Contractor shall— (1) Forward to the rightful owner... incorrect stenciling by the Contractor, the Contractor shall forward it to its rightful owner. (3) Deliver... Contractor shall forward to the owner any pieces of one lot not included in delivery, and remaining at its...
27 CFR 26.40 - Marking containers of distilled spirits.

Code of Federal Regulations, 2010 CFR

2010-04-01

... spirits. The distiller, rectifier, or bottler shall serially number each case, barrel, cask, or similar... the container, the distiller, rectifier, or bottler shall plainly print, stamp, or stencil with..., rectifier, or bottler. (b) The brand name and kind of liquor; (c) The wine and proof gallon contents; or...
Progress Towards a Cartesian Cut-Cell Method for Viscous Compressible Flow

NASA Technical Reports Server (NTRS)

Berger, Marsha; Aftosmis, Michael J.

2011-01-01

The proposed paper reports advances in developing a method for high Reynolds number compressible viscous flow simulations using a Cartesian cut-cell method with embedded boundaries. This preliminary work focuses on accuracy of the discretization near solid wall boundaries. A model problem is used to investigate the accuracy of various difference stencils for second derivatives and to guide development of the discretization of the viscous terms in the Navier-Stokes equations. Near walls, quadratic reconstruction in the wall-normal direction is used to mitigate mesh irregularity and yields smooth skin friction distributions along the body. Multigrid performance is demonstrated using second-order coarse grid operators combined with second-order restriction and prolongation operators. Preliminary verification and validation for the method is demonstrated using flat-plate and airfoil examples at compressible Mach numbers. Simulations of flow on laminar and turbulent flat plates show skin friction and velocity profiles compared with those from boundary-layer theory. Airfoil simulations are performed at laminar and turbulent Reynolds numbers with results compared to both other simulations and experimental data
Cell-Averaged discretization for incompressible Navier-Stokes with embedded boundaries and locally refined Cartesian meshes: a high-order finite volume approach

NASA Astrophysics Data System (ADS)

Bhalla, Amneet Pal Singh; Johansen, Hans; Graves, Dan; Martin, Dan; Colella, Phillip; Applied Numerical Algorithms Group Team

2017-11-01

We present a consistent cell-averaged discretization for incompressible Navier-Stokes equations on complex domains using embedded boundaries. The embedded boundary is allowed to freely cut the locally-refined background Cartesian grid. Implicit-function representation is used for the embedded boundary, which allows us to convert the required geometric moments in the Taylor series expansion (upto arbitrary order) of polynomials into an algebraic problem in lower dimensions. The computed geometric moments are then used to construct stencils for various operators like the Laplacian, divergence, gradient, etc., by solving a least-squares system locally. We also construct the inter-level data-transfer operators like prolongation and restriction for multi grid solvers using the same least-squares system approach. This allows us to retain high-order of accuracy near coarse-fine interface and near embedded boundaries. Canonical problems like Taylor-Green vortex flow and flow past bluff bodies will be presented to demonstrate the proposed method. U.S. Department of Energy, Office of Science, ASCR (Award Number DE-AC02-05CH11231).
Research of paste transition to substrate in LTCC-technology

NASA Astrophysics Data System (ADS)

Litunov, S. N.; Yurkov, V. Y.

2018-01-01

The electronics development demands for accuracy of printed technologies, in particular, to screen printing. Under a flat blade operation the print form is deformed and the image is distorted relative to the original. A squeegee in a form of a smooth cylinder reduces distortion, but it allows obtaining satisfactory print quality only when using high density grids. The paper shows findings of using roller squeegee with dosed ink supply. The roller squeegee is provided with an elastic layer. Dosage is carried out due to the cells on the elastic layer surface. There were used meshes 100-31 and 120-34 for the stencil. The experiments were carried out with layers of photopolymers and rubber. The carried out calculations made possible to choose the optimum printing pressure. Under the selected conditions, the printed image had minimal distortion. The findings allow drawing a conclusion about the possibility of roller squeegee using in chips manufacture according to LTCC-technology.
Pentadiagonal alternating-direction-implicit finite-difference time-domain method for two-dimensional Schrödinger equation

NASA Astrophysics Data System (ADS)

Tay, Wei Choon; Tan, Eng Leong

2014-07-01

In this paper, we have proposed a pentadiagonal alternating-direction-implicit (Penta-ADI) finite-difference time-domain (FDTD) method for the two-dimensional Schrödinger equation. Through the separation of complex wave function into real and imaginary parts, a pentadiagonal system of equations for the ADI method is obtained, which results in our Penta-ADI method. The Penta-ADI method is further simplified into pentadiagonal fundamental ADI (Penta-FADI) method, which has matrix-operator-free right-hand-sides (RHS), leading to the simplest and most concise update equations. As the Penta-FADI method involves five stencils in the left-hand-sides (LHS) of the pentadiagonal update equations, special treatments that are required for the implementation of the Dirichlet's boundary conditions will be discussed. Using the Penta-FADI method, a significantly higher efficiency gain can be achieved over the conventional Tri-ADI method, which involves a tridiagonal system of equations.
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
75 FR 61560 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-05

... certain provisions of the Railroad Freight Car Safety Standards, 49 CFR 215.303, which requires stenciling of restricted cars. HVRM owns four cabooses (Car Numbers: B&LE 1989, EL C345, GTW 75072, and EJ&E 184... requirements of its safety standards. The individual petition is described below, including the party seeking...

76 FR 10086 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2011-02-23

... Railroad Freight Car Safety Standards, i.e. Sec. Sec. 215.303 and 215.305, which require stenciling of restricted cars; as well as that of the Reflectorization of Rail Freight Rolling Stock, i.e. Sec. Sec. 224.3... requirements of its safety standards. The individual petition is described below, including the party seeking...
77 FR 4396 - Petition for Waiver of Compliance

Federal Register 2010, 2011, 2012, 2013, 2014

2012-01-27

... from the Railroad Freight Car Safety Standards, 49 CFR 215.303, which requires stenciling on restricted freight cars, for 13 freight cars. The list of these 13 cars is contained in the Exhibit A of the petition... compliance from certain provisions of the Federal railroad safety regulations contained at 49 CFR Part 215...
7 CFR 51.2927 - Marking and packing requirements.

Code of Federal Regulations, 2013 CFR

2013-01-01

... Requirements § 51.2927 Marking and packing requirements. The minimum size or numerical count of the apricots in any package shall be plainly labeled, stenciled, or otherwise marked on the package. (a) Numerical count. When the numerical count is used the fruit in any sample shall not vary more than one-fourth inch...
7 CFR 51.2927 - Marking and packing requirements.

Code of Federal Regulations, 2014 CFR

2014-01-01

... Requirements § 51.2927 Marking and packing requirements. The minimum size or numerical count of the apricots in any package shall be plainly labeled, stenciled, or otherwise marked on the package. (a) Numerical count. When the numerical count is used the fruit in any sample shall not vary more than one-fourth inch...
49 CFR 180.405 - Qualification of cargo tanks.

Code of Federal Regulations, 2012 CFR

2012-10-01

..., refrigerated liquid; or hydrogen chloride, refrigerated liquid shall remove the exemption number stenciled on... after July 1, 2001, or July 1, 2003, whichever is earlier. (n) Thermal activation. No later than the... compressed gas, other than carbon dioxide and chlorine, that has a water capacity of 13,247.5 L (3,500...
49 CFR 180.405 - Qualification of cargo tanks.

Code of Federal Regulations, 2014 CFR

2014-10-01

..., refrigerated liquid; or hydrogen chloride, refrigerated liquid shall remove the exemption number stenciled on... after July 1, 2001, or July 1, 2003, whichever is earlier. (n) Thermal activation. No later than the... compressed gas, other than carbon dioxide and chlorine, that has a water capacity of 13,247.5 L (3,500...
49 CFR 180.405 - Qualification of cargo tanks.

Code of Federal Regulations, 2011 CFR

2011-10-01

..., refrigerated liquid; or hydrogen chloride, refrigerated liquid shall remove the exemption number stenciled on... after July 1, 2001, or July 1, 2003, whichever is earlier. (n) Thermal activation. No later than the... compressed gas, other than carbon dioxide and chlorine, that has a water capacity of 13,247.5 L (3,500...
49 CFR 180.405 - Qualification of cargo tanks.

Code of Federal Regulations, 2013 CFR

2013-10-01

..., refrigerated liquid; or hydrogen chloride, refrigerated liquid shall remove the exemption number stenciled on... after July 1, 2001, or July 1, 2003, whichever is earlier. (n) Thermal activation. No later than the... compressed gas, other than carbon dioxide and chlorine, that has a water capacity of 13,247.5 L (3,500...
49 CFR 180.405 - Qualification of cargo tanks.

Code of Federal Regulations, 2010 CFR

2010-10-01

..., refrigerated liquid; or hydrogen chloride, refrigerated liquid shall remove the exemption number stenciled on... after July 1, 2001, or July 1, 2003, whichever is earlier. (n) Thermal activation. No later than the... compressed gas, other than carbon dioxide and chlorine, that has a water capacity of 13,247.5 L (3,500...
When the Future Becomes the Past: Where will our Print Collection Be in 2050?

DTIC Science & Technology

2015-04-01

acidification ? No. All paper should be properly stored in low temperatures, low hu- midity, and dark storage environments. Many processes such as stencils...It is designed for sailors on submarines who have no wireless internet access, no space, and lots of security concerns as they move about the ocean
Design of an essentially non-oscillatory reconstruction procedure on finite-element type meshes

NASA Technical Reports Server (NTRS)

Abgrall, R.

1991-01-01

An essentially non-oscillatory reconstruction for functions defined on finite-element type meshes was designed. Two related problems are studied: the interpolation of possibly unsmooth multivariate functions on arbitrary meshes and the reconstruction of a function from its average in the control volumes surrounding the nodes of the mesh. Concerning the first problem, we have studied the behavior of the highest coefficients of the Lagrange interpolation function which may admit discontinuities of locally regular curves. This enables us to choose the best stencil for the interpolation. The choice of the smallest possible number of stencils is addressed. Concerning the reconstruction problem, because of the very nature of the mesh, the only method that may work is the so called reconstruction via deconvolution method. Unfortunately, it is well suited only for regular meshes as we show, but we also show how to overcome this difficulty. The global method has the expected order of accuracy but is conservative up to a high order quadrature formula only. Some numerical examples are given which demonstrate the efficiency of the method.
Involution and Difference Schemes for the Navier-Stokes Equations

NASA Astrophysics Data System (ADS)

Gerdt, Vladimir P.; Blinkov, Yuri A.

In the present paper we consider the Navier-Stokes equations for the two-dimensional viscous incompressible fluid flows and apply to these equations our earlier designed general algorithmic approach to generation of finite-difference schemes. In doing so, we complete first the Navier-Stokes equations to involution by computing their Janet basis and discretize this basis by its conversion into the integral conservation law form. Then we again complete the obtained difference system to involution with eliminating the partial derivatives and extracting the minimal Gröbner basis from the Janet basis. The elements in the obtained difference Gröbner basis that do not contain partial derivatives of the dependent variables compose a conservative difference scheme. By exploiting arbitrariness in the numerical integration approximation we derive two finite-difference schemes that are similar to the classical scheme by Harlow and Welch. Each of the two schemes is characterized by a 5×5 stencil on an orthogonal and uniform grid. We also demonstrate how an inconsistent difference scheme with a 3×3 stencil is generated by an inappropriate numerical approximation of the underlying integrals.
Modeling RF Fields in Hot Plasmas with Parallel Full Wave Code

NASA Astrophysics Data System (ADS)

Spencer, Andrew; Svidzinski, Vladimir; Zhao, Liangji; Galkin, Sergei; Kim, Jin-Soo

2016-10-01

FAR-TECH, Inc. is developing a suite of full wave RF plasma codes. It is based on a meshless formulation in configuration space with adapted cloud of computational points (CCP) capability and using the hot plasma conductivity kernel to model the nonlocal plasma dielectric response. The conductivity kernel is calculated by numerically integrating the linearized Vlasov equation along unperturbed particle trajectories. Work has been done on the following calculations: 1) the conductivity kernel in hot plasmas, 2) a monitor function based on analytic solutions of the cold-plasma dispersion relation, 3) an adaptive CCP based on the monitor function, 4) stencils to approximate the wave equations on the CCP, 5) the solution to the full wave equations in the cold-plasma model in tokamak geometry for ECRH and ICRH range of frequencies, and 6) the solution to the wave equations using the calculated hot plasma conductivity kernel. We will present results on using a meshless formulation on adaptive CCP to solve the wave equations and on implementing the non-local hot plasma dielectric response to the wave equations. The presentation will include numerical results of wave propagation and absorption in the cold and hot tokamak plasma RF models, using DIII-D geometry and plasma parameters. Work is supported by the U.S. DOE SBIR program.
Unconventional critical state in YBa2Cu3O7-δ thin films with a vortex-pin lattice fabricated by masked He+ ion beam irradiation

NASA Astrophysics Data System (ADS)

Zechner, G.; Mletschnig, K. L.; Lang, W.; Dosmailov, M.; Bodea, M. A.; Pedarnig, J. D.

2018-04-01

Thin superconducting YBa2Cu3O7-δ films are patterned with a vortex-pin lattice consisting of columnar defect regions (CDs) with 180 nm diameter and 300 nm spacing. They are fabricated by irradiation with 75 keV He+ ions through a stencil mask. Peaks of the critical current reveal the commensurate trapping of vortices in domains near the edges of the sample. Upon ramping an external magnetic field, the positions of the critical current peaks are shifted from their equilibrium values to lower magnetic fields in virgin and to higher fields in field-saturated down-sweep curves, respectively. Based on previous theoretical predictions, this irreversibility is interpreted as a nonuniform, terrace-like critical state, in which individual domains are occupied by a constant number of vortices per pinning site. The magnetoresistance, probed at low current densities, is hysteretic and angle dependent and exhibits minima that correspond to the peaks of the critical current. The minima’s positions scale with the component of the magnetic field parallel to the axes of the CDs, as long as the tilted vortices can be accommodated within the CDs. This behavior, different from unirradiated films, confirms that the CDs dominate the pinning.
A discontinuous Galerkin conservative level set scheme for interface capturing in multiphase flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owkes, Mark, E-mail: mfc86@cornell.edu; Desjardins, Olivier

2013-09-15

The accurate conservative level set (ACLS) method of Desjardins et al. [O. Desjardins, V. Moureau, H. Pitsch, An accurate conservative level set/ghost fluid method for simulating turbulent atomization, J. Comput. Phys. 227 (18) (2008) 8395–8416] is extended by using a discontinuous Galerkin (DG) discretization. DG allows for the scheme to have an arbitrarily high order of accuracy with the smallest possible computational stencil resulting in an accurate method with good parallel scaling. This work includes a DG implementation of the level set transport equation, which moves the level set with the flow field velocity, and a DG implementation of themore » reinitialization equation, which is used to maintain the shape of the level set profile to promote good mass conservation. A near second order converging interface curvature is obtained by following a height function methodology (common amongst volume of fluid schemes) in the context of the conservative level set. Various numerical experiments are conducted to test the properties of the method and show excellent results, even on coarse meshes. The tests include Zalesak’s disk, two-dimensional deformation of a circle, time evolution of a standing wave, and a study of the Kelvin–Helmholtz instability. Finally, this novel methodology is employed to simulate the break-up of a turbulent liquid jet.« less
A staggered-grid convolutional differentiator for elastic wave modelling

NASA Astrophysics Data System (ADS)

Sun, Weijia; Zhou, Binzhong; Fu, Li-Yun

2015-11-01

The computation of derivatives in governing partial differential equations is one of the most investigated subjects in the numerical simulation of physical wave propagation. An analytical staggered-grid convolutional differentiator (CD) for first-order velocity-stress elastic wave equations is derived in this paper by inverse Fourier transformation of the band-limited spectrum of a first derivative operator. A taper window function is used to truncate the infinite staggered-grid CD stencil. The truncated CD operator is almost as accurate as the analytical solution, and as efficient as the finite-difference (FD) method. The selection of window functions will influence the accuracy of the CD operator in wave simulation. We search for the optimal Gaussian windows for different order CDs by minimizing the spectral error of the derivative and comparing the windows with the normal Hanning window function for tapering the CD operators. It is found that the optimal Gaussian window appears to be similar to the Hanning window function for tapering the same CD operator. We investigate the accuracy of the windowed CD operator and the staggered-grid FD method with different orders. Compared to the conventional staggered-grid FD method, a short staggered-grid CD operator achieves an accuracy equivalent to that of a long FD operator, with lower computational costs. For example, an 8th order staggered-grid CD operator can achieve the same accuracy of a 16th order staggered-grid FD algorithm but with half of the computational resources and time required. Numerical examples from a homogeneous model and a crustal waveguide model are used to illustrate the superiority of the CD operators over the conventional staggered-grid FD operators for the simulation of wave propagations.
Low-Cost Rapid Prototyping of Whole-Glass Microfluidic Devices

ERIC Educational Resources Information Center

Yuen, Po Ki; Goral, Vasiliy N.

2012-01-01

A low-cost, straightforward, rapid prototyping of whole-glass microfluidic devices is presented using glass-etching cream that can be easily purchased in local stores. A self-adhered vinyl stencil cut out by a desktop digital craft cutter was used as an etching mask for patterning microstructures in glass using the glass-etching cream. A specific…
International Communique. . . About Information, People, Places, Things. Printing Processes Issue P-8B.

ERIC Educational Resources Information Center

Peace Corps, Washington, DC. Information Collection and Exchange Div.

Focusing on the production and utilization of printing processes in constructing effective visuals for teaching, this bulletin contains articles on the silk screening stencil process, use of a similar process with a portable mimeograph, and the hectograph process. The first article lists equipment needed to make a silk screen, steps in building…
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
47. DETAIL OF ORIGINAL VANE ASSEMBLY AND TWO WHEEL SECTIONS ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

47. DETAIL OF ORIGINAL VANE ASSEMBLY AND TWO WHEEL SECTIONS FROM ELI WINDMILLS, THE VANE SHEET BEARING STENCILED PAINTED INSCRIPTION, 'KREGEL WINDMILL CO. ELI NEBRASKA CITY, NEB.' VISIBLE IN THE IMAGE ARE BOTH SIDES OF THE WHEEL SECTIONS, SHOWING THE METHOD OF BLADE MOUNTING FOR ELI WINDMILLS. - Kregel Windmill Company Factory, 1416 Central Avenue, Nebraska City, Otoe County, NE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Donofrio, David

A method and apparatus for performing stencil computations efficiently are disclosed. In one embodiment, a processor receives an offset, and in response, retrieves a value from a memory via a single instruction, where the retrieving comprises: identifying, based on the offset, one of a plurality of registers of the processor; loading an address stored in the identified register; and retrieving from the memory the value at the address.
Shipboard Facilities Maintenance and Manpower Utilization: Problem and Approach

DTIC Science & Technology

1975-11-01

sweeping, butting, polishing, lacquering, stenciling, vacuuming and shampooing , garbage disposal and trash removal, and all manner of sanitary and...spaces, passageways, heads and showers, crew lounge, mess decks, exterior deck and ship sides, and all office spaces; and limited facilities...maintenance in all passageways, heads , mess decks, office spaces, and berthing areas. They will also per- form sanitization and exterior deck and
49 CFR Appendix A to Part 231 - Schedule of Civil Penalties 1

Code of Federal Regulations, 2010 CFR

2010-10-01

... or Hand Brake Parts Wrong Design 2,500 5,000 114.B2Hand Brake Wheel or Lever Has Insufficient Clearance Around Rim or Handle 2,500 5,000 114.B3Hand Brake Wheel/Lever Clearance Insufficient to Vertical... Improperly Applied 2,500 5,000 146.ANotice or Stencil not Posted on Cabooses with Running Boards Removed 650...
Biomimetic, Self-Healing Nanocomposites for Aerospace Applications

NASA Technical Reports Server (NTRS)

Morse, Daniel E.

2003-01-01

This final report contains a summary of significant findings, and bibliographies of publications and patents resulting from the research. The findings are grouped as follows: A) Lustrin-Mimetic Self-Healing Polymer Networks; B) Nanostructure-Directing Catalysis of Synthesis of Electronically and Optoelectronically Active Metallo-oxanes and Organometallics; C) New Discovery that Molecular Stencils Control Directional Growth to Form Light-Weight Mineral Foams.
Dynamic effects of root system architecture improve root water uptake in 1-D process-based soil-root hydrodynamics

NASA Astrophysics Data System (ADS)

Bouda, Martin; Saiers, James E.

2017-12-01

Root system architecture (RSA) can significantly affect plant access to water, total transpiration, as well as its partitioning by soil depth, with implications for surface heat, water, and carbon budgets. Despite recent advances in land surface model (LSM) descriptions of plant hydraulics, descriptions of RSA have not been included because of their three-dimensional complexity, which makes them generally too computationally costly. Here we demonstrate a new, process-based 1D layered model that captures the dynamic shifts in water potential gradients of 3D RSA under different soil moisture conditions: the RSA stencil. Using root systems calibrated to the rooting profiles of four plant functional types (PFT) of the Community Land Model, we show that the RSA stencil predicts plant water potentials within 2% to the outputs of a full 3D model, under the same assumptions on soil moisture heterogeneity, despite its trivial computational cost, resulting in improved predictions of water uptake and soil moisture compared to a model without RSA in a transient simulation. Our results suggest that LSM predictions of soil moisture dynamics and dependent variables can be improved by the implementation of this model, calibrated for individual PFTs using field observations.
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Broadcasting a message in a parallel computer

DOEpatents

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
CPDES3: A preconditioned conjugate gradient solver for linear asymmetric matrix equations arising from coupled partial differential equations in three dimensions

NASA Astrophysics Data System (ADS)

Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.

1988-11-01

Many physical problems require the solution of coupled partial differential equations on three-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES3 allows each spatial operator to have 7, 15, 19, or 27 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect induces which is vectorizable on some of the newer scientific computers.
CPDES2: A preconditioned conjugate gradient solver for linear asymmetric matrix equations arising from coupled partial differential equations in two dimensions

NASA Astrophysics Data System (ADS)

Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.

1988-11-01

Many physical problems require the solution of coupled partial differential equations on two-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES2 allows each spatial operator to have 5 or 9 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect indices which is vectorizable on some of the newer scientific computers.
Efficient implicit LES method for the simulation of turbulent cavitating flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Egerer, Christian P., E-mail: christian.egerer@aer.mw.tum.de; Schmidt, Steffen J.; Hickel, Stefan

2016-07-01

We present a numerical method for efficient large-eddy simulation of compressible liquid flows with cavitation based on an implicit subgrid-scale model. Phase change and subgrid-scale interface structures are modeled by a homogeneous mixture model that assumes local thermodynamic equilibrium. Unlike previous approaches, emphasis is placed on operating on a small stencil (at most four cells). The truncation error of the discretization is designed to function as a physically consistent subgrid-scale model for turbulence. We formulate a sensor functional that detects shock waves or pseudo-phase boundaries within the homogeneous mixture model for localizing numerical dissipation. In smooth regions of the flowmore » field, a formally non-dissipative central discretization scheme is used in combination with a regularization term to model the effect of unresolved subgrid scales. The new method is validated by computing standard single- and two-phase test-cases. Comparison of results for a turbulent cavitating mixing layer obtained with the new method demonstrates its suitability for the target applications.« less
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Army Logistician. Volume 38, Issue 4, July-August 2006

DTIC Science & Technology

2006-08-01

Relationships Effective joint logistics depends on clear roles, accountabilities , and relationships among the global players within the joint logistics...well-understood roles and accountabilities of the players involved in those processes, and shared JFC metrics shape this enabler. Domain-wide...DD [Department of Defense] Form 10) (when applicable) (for sensitive cargo accountability ) X X X UIC and shipment unit number (stenciled) X (4
Tunable Infrared Metasurface on a Soft Polymer Scaffold.

PubMed

Reeves, Jeremy B; Jayne, Rachael K; Stark, Thomas J; Barrett, Lawrence K; White, Alice E; Bishop, David J

2018-05-09

The fabrication of metallic electromagnetic meta-atoms on a soft microstructured polymer scaffold using a MEMS-based stencil lithography technique is demonstrated. Using this technique, complex metasurfaces that are generally impossible to fabricate with traditional photolithographic techniques are created. By engineering the mechanical deformation of the polymer scaffold, the metasurface reflectivity in the mid-infrared can be tuned by the application of moderate strains.
4. Detail of inner side of northernmost door of Bunker ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

4. Detail of inner side of northernmost door of Bunker 103 (seen from outside in photo WA-203-B-2). Stenciling on door includes warning: 'CAUTION: Do not drag or pull powder kegs over deck or other cans. Tanks must be lifted or carried.' - Puget Sound Naval Shipyard, Munitions Storage Bunker, Naval Ammunitions Depot, North of Campbell Trail, Bremerton, Kitsap County, WA
Comparison of Node-Centered and Cell-Centered Unstructured Finite-Volume Discretizations: Viscous Fluxes

NASA Technical Reports Server (NTRS)

Diskin, Boris; Thomas, James L.; Nielsen, Eric J.; Nishikawa, Hiroaki; White, Jeffery A.

2010-01-01

Discretization of the viscous terms in current finite-volume unstructured-grid schemes are compared using node-centered and cell-centered approaches in two dimensions. Accuracy and complexity are studied for four nominally second-order accurate schemes: a node-centered scheme and three cell-centered schemes - a node-averaging scheme and two schemes with nearest-neighbor and adaptive compact stencils for least-square face gradient reconstruction. The grids considered range from structured (regular) grids to irregular grids composed of arbitrary mixtures of triangles and quadrilaterals, including random perturbations of the grid points to bring out the worst possible behavior of the solution. Two classes of tests are considered. The first class of tests involves smooth manufactured solutions on both isotropic and highly anisotropic grids with discontinuous metrics, typical of those encountered in grid adaptation. The second class concerns solutions and grids varying strongly anisotropically over a curved body, typical of those encountered in high-Reynolds number turbulent flow simulations. Tests from the first class indicate the face least-square methods, the node-averaging method without clipping, and the node-centered method demonstrate second-order convergence of discretization errors with very similar accuracies per degree of freedom. The tests of the second class are more discriminating. The node-centered scheme is always second order with an accuracy and complexity in linearization comparable to the best of the cell-centered schemes. In comparison, the cell-centered node-averaging schemes may degenerate on mixed grids, have a higher complexity in linearization, and can fail to converge to the exact solution when clipping of the node-averaged values is used. The cell-centered schemes using least-square face gradient reconstruction have more compact stencils with a complexity similar to that of the node-centered scheme. For simulations on highly anisotropic curved grids, the least-square methods have to be amended either by introducing a local mapping based on a distance function commonly available in practical schemes or modifying the scheme stencil to reflect the direction of strong coupling. The major conclusion is that accuracies of the node centered and the best cell-centered schemes are comparable at equivalent number of degrees of freedom.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
DOE/KEURP site operator program. Year 3, Second Quarter Report, October 1--December 31, 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

Kansas State University, with funding support from federal, state, public, and private companies, is participating in the Department of Energy`s Electric Vehicle Site Operator Program. Through participation in this program, Kansas State is displaying, testing, and evaluating electric or hybrid vehicle technology. This participation will provide organizations the opportunity to examine the latest EHV prototypes under actual operating conditions. KSU has purchased several electric cars and proposes to purchase additional electric vehicles. KSU has purchased one G-Van built by Conceptor Industries, Toronto, Canada and has procured two (2) Soleq 1993 Ford EVcort station wagons. During calendar year 1994, the Kansas`more » electric vehicle program expects to purchase a minimum of four and a maximum of eleven additional electric vehicles. The G-Van was signed in order for the public to be aware that it was an electric vehicle. Financial participants` names have been stenciled on the back door of the van. The Soleq EvCorts have not been signed. In order to demonstrate the technology as feasible, the EvCorts were deliberately not signed. The goal is to generate a public perception that this vehicle is no different from any similar internal combustion engine vehicle. Magnetic signs have been made for special functions to ensure sponsor support is recognized and acknowledged.« less
An algorithm for fast elastic wave simulation using a vectorized finite difference operator

NASA Astrophysics Data System (ADS)

Malkoti, Ajay; Vedanti, Nimisha; Tiwari, Ram Krishna

2018-07-01

Modern geophysical imaging techniques exploit the full wavefield information which can be simulated numerically. These numerical simulations are computationally expensive due to several factors, such as a large number of time steps and nodes, big size of the derivative stencil and huge model size. Besides these constraints, it is also important to reformulate the numerical derivative operator for improved efficiency. In this paper, we have introduced a vectorized derivative operator over the staggered grid with shifted coordinate systems. The operator increases the efficiency of simulation by exploiting the fact that each variable can be represented in the form of a matrix. This operator allows updating all nodes of a variable defined on the staggered grid, in a manner similar to the collocated grid scheme and thereby reducing the computational run-time considerably. Here we demonstrate an application of this operator to simulate the seismic wave propagation in elastic media (Marmousi model), by discretizing the equations on a staggered grid. We have compared the performance of this operator on three programming languages, which reveals that it can increase the execution speed by a factor of at least 2-3 times for FORTRAN and MATLAB; and nearly 100 times for Python. We have further carried out various tests in MATLAB to analyze the effect of model size and the number of time steps on total simulation run-time. We find that there is an additional, though small, computational overhead for each step and it depends on total number of time steps used in the simulation. A MATLAB code package, 'FDwave', for the proposed simulation scheme is available upon request.
National Centers for Environmental Prediction

Science.gov Websites

/ VISION | About EMC EMC > NAM > EXPERIMENTAL DATA Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION
Improved locality of the phase-field lattice-Boltzmann model for immiscible fluids at high density ratios

NASA Astrophysics Data System (ADS)

Fakhari, Abbas; Mitchell, Travis; Leonardi, Christopher; Bolster, Diogo

2017-11-01

Based on phase-field theory, we introduce a robust lattice-Boltzmann equation for modeling immiscible multiphase flows at large density and viscosity contrasts. Our approach is built by modifying the method proposed by Zu and He [Phys. Rev. E 87, 043301 (2013), 10.1103/PhysRevE.87.043301] in such a way as to improve efficiency and numerical stability. In particular, we employ a different interface-tracking equation based on the so-called conservative phase-field model, a simplified equilibrium distribution that decouples pressure and velocity calculations, and a local scheme based on the hydrodynamic distribution functions for calculation of the stress tensor. In addition to two distribution functions for interface tracking and recovery of hydrodynamic properties, the only nonlocal variable in the proposed model is the phase field. Moreover, within our framework there is no need to use biased or mixed difference stencils for numerical stability and accuracy at high density ratios. This not only simplifies the implementation and efficiency of the model, but also leads to a model that is better suited to parallel implementation on distributed-memory machines. Several benchmark cases are considered to assess the efficacy of the proposed model, including the layered Poiseuille flow in a rectangular channel, Rayleigh-Taylor instability, and the rise of a Taylor bubble in a duct. The numerical results are in good agreement with available numerical and experimental data.

National Centers for Environmental Prediction

Science.gov Websites

Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Fabrication of microtemplates for the control of bacterial immobilization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miyahara, Yasuhiro; Mitamura, Koji; Saito, Nagahiro

2009-09-15

The authors described a region-selective immobilization methods of bacteria by using superhydrophobic/superhydrophilic and superhydrophobic/poly(ethylene glycol) (PEG) micropatterns for culture scaffold templates. In the case of superhydrophobic/superhydrophilic micropatterns, the superhydrophobic surface was prepared first by microwave-plasma enhanced chemical vapor deposition (MPECVD) from trimethylmethoxysilane. Then the superhydrophilic regions were fabricated by irradiating the superhydrophobic surface with vuv light through a stencil mask. In the case of the superhydrophobic/PEG micropatterned surfaces, PEG surfaces were fabricated first by chemical reaction of ester groups of p-nitrophenyl PEG with NH{sub 2} group of NH{sub 2}-terminated self assembled monolayer from n-6-hexyl-3-aminopropyltrimethoxysilane. The superhydrophobic regions were fabricated bymore » MPECVD thorough a stencil mask. In this study four bacteria were selected from viewpoint of peptidoglycan cell wall (E. coli versus B. subtilis), extracellular polysaccharide (E.coli versus P. stutzeri, P. aeruginosa), and growth rate (P. stutzeri versus P. aeruginosa). The former micropattern brought discrete adhesions of E. coli and B. subtilis specifically on the hydrophobic regions, Furthermore, using the superhydrophobic/PEG micropattern, adhesion of bacteria expanded for E. coli, B. subtilis, P. stutzeri, and P. aeruginosa. They observed a high bacterial adhesion onto superhydrophobic surfaces and the inhibitive effect of bacterial adhesion on PEG surfaces.« less
Pleistocene cave art from Sulawesi, Indonesia.

PubMed

Aubert, M; Brumm, A; Ramli, M; Sutikna, T; Saptomo, E W; Hakim, B; Morwood, M J; van den Bergh, G D; Kinsley, L; Dosseto, A

2014-10-09

Archaeologists have long been puzzled by the appearance in Europe ∼40-35 thousand years (kyr) ago of a rich corpus of sophisticated artworks, including parietal art (that is, paintings, drawings and engravings on immobile rock surfaces) and portable art (for example, carved figurines), and the absence or scarcity of equivalent, well-dated evidence elsewhere, especially along early human migration routes in South Asia and the Far East, including Wallacea and Australia, where modern humans (Homo sapiens) were established by 50 kyr ago. Here, using uranium-series dating of coralloid speleothems directly associated with 12 human hand stencils and two figurative animal depictions from seven cave sites in the Maros karsts of Sulawesi, we show that rock art traditions on this Indonesian island are at least compatible in age with the oldest European art. The earliest dated image from Maros, with a minimum age of 39.9 kyr, is now the oldest known hand stencil in the world. In addition, a painting of a babirusa ('pig-deer') made at least 35.4 kyr ago is among the earliest dated figurative depictions worldwide, if not the earliest one. Among the implications, it can now be demonstrated that humans were producing rock art by ∼40 kyr ago at opposite ends of the Pleistocene Eurasian world.
Design of an essentially non-oscillatory reconstruction procedure in finite-element type meshes

NASA Technical Reports Server (NTRS)

Abgrall, Remi

1992-01-01

An essentially non oscillatory reconstruction for functions defined on finite element type meshes is designed. Two related problems are studied: the interpolation of possibly unsmooth multivariate functions on arbitary meshes and the reconstruction of a function from its averages in the control volumes surrounding the nodes of the mesh. Concerning the first problem, the behavior of the highest coefficients of two polynomial interpolations of a function that may admit discontinuities of locally regular curves is studied: the Lagrange interpolation and an approximation such that the mean of the polynomial on any control volume is equal to that of the function to be approximated. This enables the best stencil for the approximation to be chosen. The choice of the smallest possible number of stencils is addressed. Concerning the reconstruction problem, two methods were studied: one based on an adaptation of the so called reconstruction via deconvolution method to irregular meshes and one that lies on the approximation on the mean as defined above. The first method is conservative up to a quadrature formula and the second one is exactly conservative. The two methods have the expected order of accuracy, but the second one is much less expensive than the first one. Some numerical examples are given which demonstrate the efficiency of the reconstruction.
Finite Volume Element (FVE) discretization and multilevel solution of the axisymmetric heat equation

NASA Astrophysics Data System (ADS)

Litaker, Eric T.

1994-12-01

The axisymmetric heat equation, resulting from a point-source of heat applied to a metal block, is solved numerically; both iterative and multilevel solutions are computed in order to compare the two processes. The continuum problem is discretized in two stages: finite differences are used to discretize the time derivatives, resulting is a fully implicit backward time-stepping scheme, and the Finite Volume Element (FVE) method is used to discretize the spatial derivatives. The application of the FVE method to a problem in cylindrical coordinates is new, and results in stencils which are analyzed extensively. Several iteration schemes are considered, including both Jacobi and Gauss-Seidel; a thorough analysis of these schemes is done, using both the spectral radii of the iteration matrices and local mode analysis. Using this discretization, a Gauss-Seidel relaxation scheme is used to solve the heat equation iteratively. A multilevel solution process is then constructed, including the development of intergrid transfer and coarse grid operators. Local mode analysis is performed on the components of the amplification matrix, resulting in the two-level convergence factors for various combinations of the operators. A multilevel solution process is implemented by using multigrid V-cycles; the iterative and multilevel results are compared and discussed in detail. The computational savings resulting from the multilevel process are then discussed.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

1997-12-01

Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Control and protection system for paralleled modular static inverter-converter systems

NASA Technical Reports Server (NTRS)

Birchenough, A. G.; Gourash, F.

1973-01-01

A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
PUP: An Architecture to Exploit Parallel Unification in Prolog

DTIC Science & Technology

1988-03-01

environment stacking mo del similar to the Warren Abstract Machine [23] since it has been shown to be super ior to other known models (see [21]). The storage...execute in groups of independent operations. Unifications belonging to different group s may not overlap. Also unification operations belonging to the...since all parallel operations on the unification units must complete before any of the units can star t executing the next group of parallel
A stencil printed, high energy density silver oxide battery using a novel photopolymerizable poly(acrylic acid) separator.

PubMed

Braam, Kyle; Subramanian, Vivek

2015-01-27

A novel photopolymerized poly(acrylic acid) separator is demonstrated in a printed, high-energy-density silver oxide battery. The printed battery demonstrates a high capacity of 5.4 mA h cm(-2) at a discharge current density of 2.75 mA cm(-2) (C/2 rate) while delivering good mechanical flexibility and robustness. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Shock-Induced Turbulence and Acoustic Loading on Aerospace Structures

DTIC Science & Technology

2015-08-22

aerospace structures. Pulsating flows featuring unsteadiness attributed to SWTBLI can lead to fatigue and structural damages1. Advancing our understanding...transformed system of coordinates in order to minimize scaling effects that appear in stencils consisting of elements of different sizes, as well as to...preceding the separation bubble as the 5th-order MUSCL. An integral length scale of 2Δx in the streamwise direction was chosen for the digital filter
Fourier analysis of the SOR iteration

NASA Technical Reports Server (NTRS)

Leveque, R. J.; Trefethen, L. N.

1986-01-01

The SOR iteration for solving linear systems of equations depends upon an overrelaxation factor omega. It is shown that for the standard model problem of Poisson's equation on a rectangle, the optimal omega and corresponding convergence rate can be rigorously obtained by Fourier analysis. The trick is to tilt the space-time grid so that the SOR stencil becomes symmetrical. The tilted grid also gives insight into the relation between convergence rates of several variants.
Parallel Algorithms and Patterns

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robey, Robert W.

2016-06-16

This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Automatic recognition of vector and parallel operations in a higher level language

NASA Technical Reports Server (NTRS)

Schneck, P. B.

1971-01-01

A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less
Parallel operation of NH3 screw compressors - the optimum way

NASA Astrophysics Data System (ADS)

Pijnenburg, B.; Ritmann, J.

2015-08-01

The use of more smaller industrial NH3 screw compressors operating in parallel seems to offer the optimum way when it comes to fulfilling maximum part load efficiency, increased redundancy and other highly requested features in the industrial refrigeration industry today. Parallel operation in an optimum way can be selected to secure continuous operation and can in most applications be configured to ensure lower overall operating economy. New compressors are developed to meet requirements for flexibility in operation and are controlled in an intelligent way. The intelligent control system keeps focus on all external demands, but yet striving to offer always the lowest possible absorbed power, including in future scenarios with connection to smart grid.
High-Order Automatic Differentiation of Unmodified Linear Algebra Routines via Nilpotent Matrices

NASA Astrophysics Data System (ADS)

Dunham, Benjamin Z.

This work presents a new automatic differentiation method, Nilpotent Matrix Differentiation (NMD), capable of propagating any order of mixed or univariate derivative through common linear algebra functions--most notably third-party sparse solvers and decomposition routines, in addition to basic matrix arithmetic operations and power series--without changing data-type or modifying code line by line; this allows differentiation across sequences of arbitrarily many such functions with minimal implementation effort. NMD works by enlarging the matrices and vectors passed to the routines, replacing each original scalar with a matrix block augmented by derivative data; these blocks are constructed with special sparsity structures, termed "stencils," each designed to be isomorphic to a particular multidimensional hypercomplex algebra. The algebras are in turn designed such that Taylor expansions of hypercomplex function evaluations are finite in length and thus exactly track derivatives without approximation error. Although this use of the method in the "forward mode" is unique in its own right, it is also possible to apply it to existing implementations of the (first-order) discrete adjoint method to find high-order derivatives with lowered cost complexity; for example, for a problem with N inputs and an adjoint solver whose cost is independent of N--i.e., O(1)--the N x N Hessian can be found in O(N) time, which is comparable to existing second-order adjoint methods that require far more problem-specific implementation effort. Higher derivatives are likewise less expensive--e.g., a N x N x N rank-three tensor can be found in O(N2). Alternatively, a Hessian-vector product can be found in O(1) time, which may open up many matrix-based simulations to a range of existing optimization or surrogate modeling approaches. As a final corollary in parallel to the NMD-adjoint hybrid method, the existing complex-step differentiation (CD) technique is also shown to be capable of finding the Hessian-vector product. All variants are implemented on a stochastic diffusion problem and compared in-depth with various cost and accuracy metrics.
Deployable Integral Field Units, Multislits, and Image Slicer for the Goodman Imaging Spectrograph on the SOAR Telescope

NASA Astrophysics Data System (ADS)

Cecil, Gerald N.; Moffett, A. J.; Cui, Y.; Eckert, K. D.; McBride, J.; Kannappan, S.; Keller, K.; Barlow, B. N.; Dunlap, B.; Bland-Hawthorn, J.

2010-01-01

The Goodman Imager-Spectrograph on the 4.1m SOAR telescope has operated on Cerro Pachon, Chile with volume-phase holographic gratings in long-slit mode since its commissioning in 2008. Recently, UNC graduate students played key roles to implement robust upgrades for multi-object spectroscopy that will soon be available to US astronomers through the NOAO time share on SOAR: • Multislits over 3x5 arcmin, generated on PCB solder stencils with exceptional sharpness compared to conventional laser cuts, initially to survey globular clusters for pulsating hot sub-dwarfs • An image slicer to obtain 3 simultaneous parallel spectra 70-arcsec long, 1- or 2-arcsec wide, spanning 320-750 nm to map stellar and gaseous emission and mass over the 1500 galaxies in the RESOLVE survey underway on SOAR • Four integral field units, each composed of 5-arcsec diameter, fused bundles of 0.5-arcsec diameter thin-clad optical fiber, independently deployed over a 10x5 arcmin field targeted by an EMCCD also used for Lucky Imaging. Initially will study aperture effects in single fiber surveys, extragalactic globular clusters, and demonstrate technology prior to deployment on larger telescopes • New wheels supporting a large set of existing narrow-band and Sloan filters • A trombone-style atmospheric dispersion compensator that corrects the full 12-arcmin diameter science field down to 30 deg elevation. Working in UNC's Goodman Laboratory for Astronomical Instrumentation, students employed SolidWorks and ZEMAX to design parts for in-house CAM on CNC machines and a 3D printer. All motors are controlled by LabVIEW as is the SOAR TCS. The deployable IFU axes are controlled by Quicksilver Controls Inc. intelligent servos and $80 model robot (Firgelli Corp.) actuators driven by a PIC-microcontroller and a student designed custom PCB. Upgrades and students were supported by $200K from SOAR Corporation, Research Corporation, NSF, and UNC competitive funds, and NC NASA Space Grant, Sigma Xi, and NASA fellowships.
A Systematic Approach for Obtaining Performance on Matrix-Like Operations

NASA Astrophysics Data System (ADS)

Veras, Richard Michael

Scientific Computation provides a critical role in the scientific process because it allows us ask complex queries and test predictions that would otherwise be unfeasible to perform experimentally. Because of its power, Scientific Computing has helped drive advances in many fields ranging from Engineering and Physics to Biology and Sociology to Economics and Drug Development and even to Machine Learning and Artificial Intelligence. Common among these domains is the desire for timely computational results, thus a considerable amount of human expert effort is spent towards obtaining performance for these scientific codes. However, this is no easy task because each of these domains present their own unique set of challenges to software developers, such as domain specific operations, structurally complex data and ever-growing datasets. Compounding these problems are the myriads of constantly changing, complex and unique hardware platforms that an expert must target. Unfortunately, an expert is typically forced to reproduce their effort across multiple problem domains and hardware platforms. In this thesis, we demonstrate the automatic generation of expert level high-performance scientific codes for Dense Linear Algebra (DLA), Structured Mesh (Stencil), Sparse Linear Algebra and Graph Analytic. In particular, this thesis seeks to address the issue of obtaining performance on many complex platforms for a certain class of matrix-like operations that span across many scientific, engineering and social fields. We do this by automating a method used for obtaining high performance in DLA and extending it to structured, sparse and scale-free domains. We argue that it is through the use of the underlying structure found in the data from these domains that enables this process. Thus, obtaining performance for most operations does not occur in isolation of the data being operated on, but instead depends significantly on the structure of the data.

Parallel dispatch: a new paradigm of electrical power system dispatch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang

Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

DOEpatents

Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey

2010-10-05

In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
Optimal resonance configuration for ultrasonic wireless power transmission to millimeter-sized biomedical implants.

PubMed

Miao Meng; Kiani, Mehdi

2016-08-01

In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.
Matrix-Free Polynomial-Based Nonlinear Least Squares Optimized Preconditioning and its Application to Discontinuous Galerkin Discretizations of the Euler Equations

DTIC Science & Technology

2015-06-01

cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
Operation of high power converters in parallel

NASA Technical Reports Server (NTRS)

Decker, D. K.; Inouye, L. Y.

1993-01-01

High power converters that are used in space power subsystems are limited in power handling capability due to component and thermal limitations. For applications, such as Space Station Freedom, where multi-kilowatts of power must be delivered to user loads, parallel operation of converters becomes an attractive option when considering overall power subsystem topologies. TRW developed three different unequal power sharing approaches for parallel operation of converters. These approaches, known as droop, master-slave, and proportional adjustment, are discussed and test results are presented.
Thermoelectric Coolers with Sintered Silver Interconnects

NASA Astrophysics Data System (ADS)

Kähler, Julian; Stranz, Andrej; Waag, Andreas; Peiner, Erwin

2014-06-01

The fabrication and performance of a sintered Peltier cooler (SPC) based on bismuth telluride with sintered silver interconnects are described. Miniature SPC modules with a footprint of 20 mm2 were assembled using pick-and-place pressure-assisted silver sintering at low pressure (5.5 N/mm2) and moderate temperature (250°C to 270°C). A modified flip-chip bonder combined with screen/stencil printing for paste transfer was used for the pick-and-place process, enabling high positioning accuracy, easy handling of the tiny bismuth telluride pellets, and immediate visual process control. A specific contact resistance of (1.4 ± 0.1) × 10-5 Ω cm2 was found, which is in the range of values reported for high-temperature solder interconnects of bismuth telluride pellets. The realized SPCs were evaluated from room temperature to 300°C, considerably outperforming the operating temperature range of standard commercial Peltier coolers. Temperature cycling capability was investigated from 100°C to 235°C over more than 200 h, i.e., 850 cycles, during which no degradation of module resistance or cooling performance occurred.
Distributed Relaxation for Conservative Discretizations

NASA Technical Reports Server (NTRS)

Diskin, Boris; Thomas, James L.

2001-01-01

A multigrid method is defined as having textbook multigrid efficiency (TME) if the solutions to the governing system of equations are attained in a computational work that is a small (less than 10) multiple of the operation count in one target-grid residual evaluation. The way to achieve this efficiency is the distributed relaxation approach. TME solvers employing distributed relaxation have already been demonstrated for nonconservative formulations of high-Reynolds-number viscous incompressible and subsonic compressible flow regimes. The purpose of this paper is to provide foundations for applications of distributed relaxation to conservative discretizations. A direct correspondence between the primitive variable interpolations for calculating fluxes in conservative finite-volume discretizations and stencils of the discretized derivatives in the nonconservative formulation has been established. Based on this correspondence, one can arrive at a conservative discretization which is very efficiently solved with a nonconservative relaxation scheme and this is demonstrated for conservative discretization of the quasi one-dimensional Euler equations. Formulations for both staggered and collocated grid arrangements are considered and extensions of the general procedure to multiple dimensions are discussed.
Failure of Anisotropic Unstructured Mesh Adaption Based on Multidimensional Residual Minimization

NASA Technical Reports Server (NTRS)

Wood, William A.; Kleb, William L.

2003-01-01

An automated anisotropic unstructured mesh adaptation strategy is proposed, implemented, and assessed for the discretization of viscous flows. The adaption criteria is based upon the minimization of the residual fluctuations of a multidimensional upwind viscous flow solver. For scalar advection, this adaption strategy has been shown to use fewer grid points than gradient based adaption, naturally aligning mesh edges with discontinuities and characteristic lines. The adaption utilizes a compact stencil and is local in scope, with four fundamental operations: point insertion, point deletion, edge swapping, and nodal displacement. Evaluation of the solution-adaptive strategy is performed for a two-dimensional blunt body laminar wind tunnel case at Mach 10. The results demonstrate that the strategy suffers from a lack of robustness, particularly with regard to alignment of the bow shock in the vicinity of the stagnation streamline. In general, constraining the adaption to such a degree as to maintain robustness results in negligible improvement to the solution. Because the present method fails to consistently or significantly improve the flow solution, it is rejected in favor of simple uniform mesh refinement.
An Angular Method with Position Control for Block Mesh Squareness Improvement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yao, J.; Stillman, D.

We optimize a target function de ned by angular properties with a position control term for a basic stencil with a block-structured mesh, to improve element squareness in 2D and 3D. Comparison with the condition number method shows that besides a similar mesh quality regarding orthogonality can be achieved as the former does, the new method converges faster and provides a more uniform global mesh spacing in our numerical tests.
A high-order multi-zone cut-stencil method for numerical simulations of high-speed flows over complex geometries

NASA Astrophysics Data System (ADS)

Greene, Patrick T.; Eldredge, Jeff D.; Zhong, Xiaolin; Kim, John

2016-07-01

In this paper, we present a method for performing uniformly high-order direct numerical simulations of high-speed flows over arbitrary geometries. The method was developed with the goal of simulating and studying the effects of complex isolated roughness elements on the stability of hypersonic boundary layers. The simulations are carried out on Cartesian grids with the geometries imposed by a third-order cut-stencil method. A fifth-order hybrid weighted essentially non-oscillatory scheme was implemented to capture any steep gradients in the flow created by the geometries and a third-order Runge-Kutta method is used for time advancement. A multi-zone refinement method was also utilized to provide extra resolution at locations with expected complex physics. The combination results in a globally fourth-order scheme in space and third order in time. Results confirming the method's high order of convergence are shown. Two-dimensional and three-dimensional test cases are presented and show good agreement with previous results. A simulation of Mach 3 flow over the logo of the Ubuntu Linux distribution is shown to demonstrate the method's capabilities for handling complex geometries. Results for Mach 6 wall-bounded flow over a three-dimensional cylindrical roughness element are also presented. The results demonstrate that the method is a promising tool for the study of hypersonic roughness-induced transition.
Cell-centered high-order hyperbolic finite volume method for diffusion equation on unstructured grids

NASA Astrophysics Data System (ADS)

Lee, Euntaek; Ahn, Hyung Taek; Luo, Hong

2018-02-01

We apply a hyperbolic cell-centered finite volume method to solve a steady diffusion equation on unstructured meshes. This method, originally proposed by Nishikawa using a node-centered finite volume method, reformulates the elliptic nature of viscous fluxes into a set of augmented equations that makes the entire system hyperbolic. We introduce an efficient and accurate solution strategy for the cell-centered finite volume method. To obtain high-order accuracy for both solution and gradient variables, we use a successive order solution reconstruction: constant, linear, and quadratic (k-exact) reconstruction with an efficient reconstruction stencil, a so-called wrapping stencil. By the virtue of the cell-centered scheme, the source term evaluation was greatly simplified regardless of the solution order. For uniform schemes, we obtain the same order of accuracy, i.e., first, second, and third orders, for both the solution and its gradient variables. For hybrid schemes, recycling the gradient variable information for solution variable reconstruction makes one order of additional accuracy, i.e., second, third, and fourth orders, possible for the solution variable with less computational work than needed for uniform schemes. In general, the hyperbolic method can be an effective solution technique for diffusion problems, but instability is also observed for the discontinuous diffusion coefficient cases, which brings necessity for further investigation about the monotonicity preserving hyperbolic diffusion method.
Multiscale computations with a wavelet-adaptive algorithm

NASA Astrophysics Data System (ADS)

Rastigejev, Yevgenii Anatolyevich

A wavelet-based adaptive multiresolution algorithm for the numerical solution of multiscale problems governed by partial differential equations is introduced. The main features of the method include fast algorithms for the calculation of wavelet coefficients and approximation of derivatives on nonuniform stencils. The connection between the wavelet order and the size of the stencil is established. The algorithm is based on the mathematically well established wavelet theory. This allows us to provide error estimates of the solution which are used in conjunction with an appropriate threshold criteria to adapt the collocation grid. The efficient data structures for grid representation as well as related computational algorithms to support grid rearrangement procedure are developed. The algorithm is applied to the simulation of phenomena described by Navier-Stokes equations. First, we undertake the study of the ignition and subsequent viscous detonation of a H2 : O2 : Ar mixture in a one-dimensional shock tube. Subsequently, we apply the algorithm to solve the two- and three-dimensional benchmark problem of incompressible flow in a lid-driven cavity at large Reynolds numbers. For these cases we show that solutions of comparable accuracy as the benchmarks are obtained with more than an order of magnitude reduction in degrees of freedom. The simulations show the striking ability of the algorithm to adapt to a solution having different scales at different spatial locations so as to produce accurate results at a relatively low computational cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Vincenti, H.; Vay, J. -L.

Due to discretization effects and truncation to finite domains, many electromagnetic simulations present non-physical modifications of Maxwell's equations in space that may generate spurious signals affecting the overall accuracy of the result. Such modifications for instance occur when Perfectly Matched Layers (PMLs) are used at simulation domain boundaries to simulate open media. Another example is the use of arbitrary order Maxwell solver with domain decomposition technique that may under some condition involve stencil truncations at subdomain boundaries, resulting in small spurious errors that do eventually build up. In each case, a careful evaluation of the characteristics and magnitude of themore » errors resulting from these approximations, and their impact at any frequency and angle, requires detailed analytical and numerical studies. To this end, we present a general analytical approach that enables the evaluation of numerical discretization errors of fully three-dimensional arbitrary order finite-difference Maxwell solver, with arbitrary modification of the local stencil in the simulation domain. The analytical model is validated against simulations of domain decomposition technique and PMLs, when these are used with very high-order Maxwell solver, as well as in the infinite order limit of pseudo-spectral solvers. Results confirm that the new analytical approach enables exact predictions in each case. It also confirms that the domain decomposition technique can be used with very high-order Maxwell solver and a reasonably low number of guard cells with negligible effects on the whole accuracy of the simulation.« less
Operator assistant to support deep space network link monitor and control

NASA Technical Reports Server (NTRS)

Cooper, Lynne P.; Desai, Rajiv; Martinez, Elmain

1992-01-01

Preparing the Deep Space Network (DSN) stations to support spacecraft missions (referred to as pre-cal, for pre-calibration) is currently an operator and time intensive activity. Operators are responsible for sending and monitoring several hundred operator directivities, messages, and warnings. Operator directives are used to configure and calibrate the various subsystems (antenna, receiver, etc.) necessary to establish a spacecraft link. Messages and warnings are issued by the subsystems upon completion of an operation, changes of status, or an anomalous condition. Some points of pre-cal are logically parallel. Significant time savings could be realized if the existing Link Monitor and Control system (LMC) could support the operator in exploiting the parallelism inherent in pre-cal activities. Currently, operators may work on the individual subsystems in parallel, however, the burden of monitoring these parallel operations resides solely with the operator. Messages, warnings, and directives are all presented as they are received; without being correlated to the event that triggered them. Pre-cal is essentially an overhead activity. During pre-cal, no mission is supported, and no other activity can be performed using the equipment in the link. Therefore, it is highly desirable to reduce pre-cal time as much as possible. One approach to do this, as well as to increase efficiency and reduce errors, is the LMC Operator Assistant (OA). The LMC OA prototype demonstrates an architecture which can be used in concert with the existing LMC to exploit parallelism in pre-cal operations while providing the operators with a true monitoring capability, situational awareness and positive control. This paper presents an overview of the LMC OA architecture and the results from initial prototyping and test activities.
Parallel image logical operations using cross correlation

NASA Technical Reports Server (NTRS)

Strong, J. P., III

1972-01-01

Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.
Effecting a broadcast with an allreduce operation on a parallel computer

DOEpatents

Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
A Simulation Study of Instrument Meteorological Condition Approaches to Dual Parallel Runways Spaced 3400 and 2500 Feet Apart Using Flight-Deck-Centered Technology

NASA Technical Reports Server (NTRS)

Waller, Marvin C.; Scanlon, Charles H.

1999-01-01

A number of our nations airports depend on closely spaced parallel runway operations to handle their normal traffic throughput when weather conditions are favorable. For safety these operations are curtailed in Instrument Meteorological Conditions (IMC) when the ceiling or visibility deteriorates and operations in many cases are limited to the equivalent of a single runway. Where parallel runway spacing is less than 2500 feet, capacity loss in IMC is on the order of 50 percent for these runways. Clearly, these capacity losses result in landing delays, inconveniences to the public, increased operational cost to the airlines, and general interruption of commerce. This document presents a description and the results of a fixed-base simulation study to evaluate an initial concept that includes a set of procedures for conducting safe flight in closely spaced parallel runway operations in IMC. Consideration of flight-deck information technology and displays to support the procedures is also included in the discussions. The procedures and supporting technology rely heavily on airborne capabilities operating in conjunction with the air traffic control system.
High-Order Methods for Computational Fluid Dynamics: A Brief Review of Compact Differential Formulations on Unstructured Grids

NASA Technical Reports Server (NTRS)

Huynh, H. T.; Wang, Z. J.; Vincent, P. E.

2013-01-01

Popular high-order schemes with compact stencils for Computational Fluid Dynamics (CFD) include Discontinuous Galerkin (DG), Spectral Difference (SD), and Spectral Volume (SV) methods. The recently proposed Flux Reconstruction (FR) approach or Correction Procedure using Reconstruction (CPR) is based on a differential formulation and provides a unifying framework for these high-order schemes. Here we present a brief review of recent developments for the FR/CPR schemes as well as some pacing items.
Further information on the prehistoric representations of human hands in the cave of Gargas

PubMed Central

Hooper, Alex

1980-01-01

This paper amends and adds recent information to Paul A. Janssens' earlier article on the prehistoric paintings of human hands in the cave of Gargas, France.1 Possible diagnoses for the deficiencies found in many of the hand pictures, and some non-medical theories of explanation, are reviewed. It is concluded that the hands used as stencils were mutilated and that the images were deliberately placed within the cave and were not the by-products of some other activity. PMID:6990130
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

PubMed

Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-06-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.

Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆

PubMed Central

Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-01-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
National Centers for Environmental Prediction

Science.gov Websites

Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration /EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION / DIAGNOSTICS Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Linearly exact parallel closures for slab geometry

NASA Astrophysics Data System (ADS)

Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

2013-08-01

Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
A Concurrent Implementation of the Cascade-Correlation Algorithm, Using the Time Warp Operating System

NASA Technical Reports Server (NTRS)

Springer, P.

1993-01-01

This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.
Making almost commuting matrices commute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hastings, Matthew B

Suppose two Hermitian matrices A, B almost commute ({parallel}[A,B]{parallel} {<=} {delta}). Are they close to a commuting pair of Hermitian matrices, A', B', with {parallel}A-A'{parallel},{parallel}B-B'{parallel} {<=} {epsilon}? A theorem of H. Lin shows that this is uniformly true, in that for every {epsilon} > 0 there exists a {delta} > 0, independent of the size N of the matrices, for which almost commuting implies being close to a commuting pair. However, this theorem does not specifiy how {delta} depends on {epsilon}. We give uniform bounds relating {delta} and {epsilon}. The proof is constructive, giving an explicit algorithm to construct A'more » and B'. We provide tighter bounds in the case of block tridiagonal and tridiagnonal matrices. Within the context of quantum measurement, this implies an algorithm to construct a basis in which we can make a projective measurement that approximately measures two approximately commuting operators simultaneously. Finally, we comment briefly on the case of approximately measuring three or more approximately commuting operators using POVMs (positive operator-valued measures) instead of projective measurements.« less
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
Mechanically adjustable single-molecule transistors and stencil mask nanofabrication of high-resolution scanning probes

NASA Astrophysics Data System (ADS)

Champagne, Alexandre

This dissertation presents the development of two original experimental techniques to probe nanoscale objects. The first one studies electronic transport in single organic molecule transistors in which the source-drain electrode spacing is mechanically adjustable. The second involves the fabrication of high-resolution scanning probe microscopy sensors using a stencil mask lithography technique. We describe the fabrication of transistors in which a single organic molecule can be incorporated. The source and drain leads of these transistors are freely suspended above a flexible substrate, and their spacing can be adjusted by bending the substrate. We detail the technology developed to carry out measurements on these samples. We study electronic transport in single C60 molecules at low temperature. We observe Coulomb blockaded transport and can resolve the discrete energy spectrum of the molecule. We are able to mechanically tune the spacing between the electrodes (over a range of 5 A) to modulate the lead-molecule coupling, and can electrostatically tune the energy levels on the molecule by up to 160 meV using a gate electrode. Initial progress in studying different transport regimes in other molecules is also discussed. We present a lithographic process that allows the deposition of metal nanostructures with a resolution down to 10 nm directly onto atomic force microscope (AFM) tips. We show that multiple layers of lithography can be deposited and aligned. We fabricate high-resolution magnetic force microscopy (MFM) probes using this method and discuss progress to fabricate other scanning probe microscopy (SPM) sensors.
Wavelet-based adaptation methodology combined with finite difference WENO to solve ideal magnetohydrodynamics

NASA Astrophysics Data System (ADS)

Do, Seongju; Li, Haojun; Kang, Myungjoo

2017-06-01

In this paper, we present an accurate and efficient wavelet-based adaptive weighted essentially non-oscillatory (WENO) scheme for hydrodynamics and ideal magnetohydrodynamics (MHD) equations arising from the hyperbolic conservation systems. The proposed method works with the finite difference weighted essentially non-oscillatory (FD-WENO) method in space and the third order total variation diminishing (TVD) Runge-Kutta (RK) method in time. The philosophy of this work is to use the lifted interpolating wavelets as not only detector for singularities but also interpolator. Especially, flexible interpolations can be performed by an inverse wavelet transformation. When the divergence cleaning method introducing auxiliary scalar field ψ is applied to the base numerical schemes for imposing divergence-free condition to the magnetic field in a MHD equation, the approximations to derivatives of ψ require the neighboring points. Moreover, the fifth order WENO interpolation requires large stencil to reconstruct high order polynomial. In such cases, an efficient interpolation method is necessary. The adaptive spatial differentiation method is considered as well as the adaptation of grid resolutions. In order to avoid the heavy computation of FD-WENO, in the smooth regions fixed stencil approximation without computing the non-linear WENO weights is used, and the characteristic decomposition method is replaced by a component-wise approach. Numerical results demonstrate that with the adaptive method we are able to resolve the solutions that agree well with the solution of the corresponding fine grid.
Proceedings from the Workshop on Large-Grained Parallelism (2nd) Held in Hidden Valley, Pennsylvania on October 11-14, 1987.

DTIC Science & Technology

1987-11-01

The purpose of the workshop was to bring together people whose interests lie in the areas of operating I systems , programming languages, and formal... operating system support, and applications. There were parallel discussions on scheduling and distributed languages, and on real-time and operating ...number of key challenges: * Distributed systems , languages, environments - Make transactions efficient. Integrate them into the operating system
A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.

PubMed

Soto-Quiros, Pablo

2015-01-01

This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
Parallel Electrochemical Treatment System and Application for Identifying Acid-Stable Oxygen Evolution Electrocatalysts

DOE PAGES

Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...

2015-01-05

There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-01-10

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Way for LEEPL technology to succeed in memory device application

NASA Astrophysics Data System (ADS)

Kim, In-Sung; Woo, Sang-Gyun; Cho, Han-Ku; Han, Woo-Sung; Moon, Joo-Tae

2004-05-01

Lithography for 65nm-node device is drawing a lot of attentions these days especially because lithography solution for this node is not clear and even tool makers tend to wait for the consensus in lithography roadmap to avoid the risk of erroneous amount of investment. Recently proposed concept of low energy electron-beam proximity-projection lithography (LEEPL)1,2 technology has already released its first production machine in 2003, which is being expected to cover the design rule down to 65nm-node and even smaller3. Although production of semiconductor device has been pursuing optical lithography, without any optical technology that is proved as a convincing solution for 65nm node and below, we need to take account of all the candidates. So we made an investigation on LEEPL technology and evaluated beta and first production tool to see the feasibility of printing sub-70nm resolution and of optic-first mix-and-match overlay from a chip maker"s point of view. Two different kinds of stencil masks were fabricated for the evaluation, which are fabricated in SiC and Si membrane. The former mask is for sparse contact holes(C/H) and the latter for dense C/Hs. Beta-tool showed a good resolving power of sub-70nm sparse C/Hs of SRAM with negligibly small proximity effect. It implies that LEEPL does not require much effort for proximity correction comparing to that required in optical lithography, which is one of the biggest issues in low-k1. LEEPL also showed a good capability of optic-first mix-and-match overlay correction and this is the most stringent and important functionality for optic-first mix-and-match application. However random intra-membrane image placement(IP) error that is a little bit larger than the requirement for sub-70nm node was observed, which is interpreted to come from the larger stress of 100MPa in 3X3mm2 dry-etched SiC unit membrane. For dense C/Hs, we failed, to the contrary, to obtain any good quality of stencil masks for DRAM cell patterns because of e-beam proximity effect which is unavoidable in the reversed order of front-side forward direct writing and back-side later membrane formation. Pros and cons of LEEPL technology are discussed based on the evaluation results and estimation from the memory device standpoint. We also propose a novel concept of stencil mask that can be helpful in memory device application.
Parallel transjugular intrahepatic portosystemic shunt for controlling portal hypertension complications in cirrhotic patients.

PubMed

He, Fu-Liang; Wang, Lei; Yue, Zhen-Dong; Zhao, Hong-Wei; Liu, Fu-Quan

2014-09-07

To evaluate the feasibility of a second parallel transjugular intrahepatic portosystemic shunt (TIPS) to reduce portal venous pressure and control complications of portal hypertension. From January 2011 to December 2012, 10 cirrhotic patients were treated for complications of portal hypertension. The demographic data, operative data, postoperative recovery data, hemodynamic data, and complications were analyzed. Ten patients underwent a primary and parallel TIPS. Technical success rate was 100% with no technical complications. The mean duration of the first operation was 89.20 ± 29.46 min and the second operation was 57.0 ± 12.99 min. The mean portal system pressure decreased from 54.80 ± 4.16 mmHg to 39.0 ± 3.20 mmHg after the primary TIPS and from 44.40 ± 3.95 mmHg to 26.10 ± 4.07 mmHg after the parallel TIPS creation. The mean portosystemic pressure gradient decreased from 43.80 ± 6.18 mmHg to 31.90 ± 2.85 mmHg after the primary TIPS and from 35.60 ± 2.72 mmHg to 15.30 ± 3.27 mmHg after the parallel TIPS creation. Clinical improvement was seen in all patients after the parallel TIPS creation. One patient suffered from transient grade I hepatic encephalopathy (HE) after the primary TIPS and four patients experienced transient grade I-II after the parallel TIPS procedure. Mean hospital stay after the first and second operations were 15.0 ± 3.71 d and 16.90 ± 5.11 d (P = 0.014), respectively. After a mean 14.0 ± 3.13 mo follow-up, ascites and bleeding were well controlled and no stenosis of the stents was found. Parallel TIPS is an effective approach for controlling portal hypertension complications.
Parallel and Multivalued Logic by the Two-Dimensional Photon-Echo Response of a Rhodamine–DNA Complex

PubMed Central

2015-01-01

Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269
The Fight Deck Perspective of the NASA Langley AILS Concept

NASA Technical Reports Server (NTRS)

Rine, Laura L.; Abbott, Terence S.; Lohr, Gary W.; Elliott, Dawn M.; Waller, Marvin C.; Perry, R. Brad

2000-01-01

Many US airports depend on parallel runway operations to meet the growing demand for day to day operations. In the current airspace system, Instrument Meteorological Conditions (IMC) reduce the capacity of close parallel runway operations; that is, runways spaced closer than 4300 ft. These capacity losses can result in landing delays causing inconveniences to the traveling public, interruptions in commerce, and increased operating costs to the airlines. This document presents the flight deck perspective component of the Airborne Information for Lateral Spacing (AILS) approaches to close parallel runways in IMC. It represents the ideas the NASA Langley Research Center (LaRC) AILS Development Team envisions to integrate a number of components and procedures into a workable system for conducting close parallel runway approaches. An initial documentation of the aspects of this concept was sponsored by LaRC and completed in 1996. Since that time a number of the aspects have evolved to a more mature state. This paper is an update of the earlier documentation.
A distributed parallel storage architecture and its potential application within EOSDIS

NASA Technical Reports Server (NTRS)

Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony

1994-01-01

We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.

Terminal Area Procedures for Paired Runways

NASA Technical Reports Server (NTRS)

Lozito, Sandra; Verma, Savita Arora

2011-01-01

Parallel runway operations have been found to increase capacity within the National Airspace but poor visibility conditions reduce the use of these operations. The NextGen and SESAR Programs have identified the capacity benefits from increased use of closely-space parallel runway. Previous research examined the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This simulation study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: two levels of flight deck automation (current-day flight deck automation and auto speed control future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Results show the operations in this study were acceptable and safe. Subjective workload, when using the pairing procedures and tools, was generally low for both controllers and pilots, and situation awareness was typically moderate to high. Pilot workload was influenced by display type and automation condition. Further research on pairing and off-nominal conditions is required however, this investigation identified promising findings about the feasibility of closely-spaced parallel runway operations.
Rapid code acquisition algorithms employing PN matched filters

NASA Technical Reports Server (NTRS)

Su, Yu T.

1988-01-01

The performance of four algorithms using pseudonoise matched filters (PNMFs), for direct-sequence spread-spectrum systems, is analyzed. They are: parallel search with fix dwell detector (PL-FDD), parallel search with sequential detector (PL-SD), parallel-serial search with fix dwell detector (PS-FDD), and parallel-serial search with sequential detector (PS-SD). The operation characteristic for each detector and the mean acquisition time for each algorithm are derived. All the algorithms are studied in conjunction with the noncoherent integration technique, which enables the system to operate in the presence of data modulation. Several previous proposals using PNMF are seen as special cases of the present algorithms.
Separating the Laparoscopic Camera Cord From the Monopolar "Bovie" Cord Reduces Unintended Thermal Injury From Antenna Coupling: A Randomized Controlled Trial.

PubMed

Robinson, Thomas N; Jones, Edward L; Dunn, Christina L; Dunne, Bruce; Johnson, Elizabeth; Townsend, Nicole T; Paniccia, Alessandro; Stiegmann, Greg V

2015-06-01

The monopolar "Bovie" is used in virtually every laparoscopic operation. The active electrode and its cord emit radiofrequency energy that couples (or transfers) to nearby conductive material without direct contact. This phenomenon is increased when the active electrode cord is oriented parallel to another wire/cord. The parallel orientation of the "Bovie" and laparoscopic camera cords cause transfer of energy to the camera cord resulting in cutaneous burns at the camera trocar incision. We hypothesized that separating the active electrode/camera cords would reduce thermal injury occurring at the camera trocar incision in comparison to parallel oriented active electrode/camera cords. In this prospective, blinded, randomized controlled trial, patients undergoing standardized laparoscopic cholecystectomy were randomized to separated active electrode/camera cords or parallel oriented active electrode/camera cords. The primary outcome variable was thermal injury determined by histology from skin biopsied at the camera trocar incision. Eighty-four patients participated. Baseline demographics were similar in the groups for age, sex, preoperative diagnosis, operative time, and blood loss. Thermal injury at the camera trocar incision was lower in the separated versus parallel group (31% vs 57%; P = 0.027). Separation of the laparoscopic camera cord from the active electrode cord decreases thermal injury from antenna coupling at the camera trocar incision in comparison to the parallel orientation of these cords. Therefore, parallel orientation of these cords (an arrangement promoted by integrated operating rooms) should be abandoned. The findings of this study should influence the operating room setup for all laparoscopic cases.
Demonstration of an optoelectronic interconnect architecture for a parallel modified signed-digit adder and subtracter

NASA Astrophysics Data System (ADS)

Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.

1996-06-01

A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.
Time-Accurate Local Time Stepping and High-Order Time CESE Methods for Multi-Dimensional Flows Using Unstructured Meshes

NASA Technical Reports Server (NTRS)

Chang, Chau-Lyan; Venkatachari, Balaji Shankar; Cheng, Gary

2013-01-01

With the wide availability of affordable multiple-core parallel supercomputers, next generation numerical simulations of flow physics are being focused on unsteady computations for problems involving multiple time scales and multiple physics. These simulations require higher solution accuracy than most algorithms and computational fluid dynamics codes currently available. This paper focuses on the developmental effort for high-fidelity multi-dimensional, unstructured-mesh flow solvers using the space-time conservation element, solution element (CESE) framework. Two approaches have been investigated in this research in order to provide high-accuracy, cross-cutting numerical simulations for a variety of flow regimes: 1) time-accurate local time stepping and 2) highorder CESE method. The first approach utilizes consistent numerical formulations in the space-time flux integration to preserve temporal conservation across the cells with different marching time steps. Such approach relieves the stringent time step constraint associated with the smallest time step in the computational domain while preserving temporal accuracy for all the cells. For flows involving multiple scales, both numerical accuracy and efficiency can be significantly enhanced. The second approach extends the current CESE solver to higher-order accuracy. Unlike other existing explicit high-order methods for unstructured meshes, the CESE framework maintains a CFL condition of one for arbitrarily high-order formulations while retaining the same compact stencil as its second-order counterpart. For large-scale unsteady computations, this feature substantially enhances numerical efficiency. Numerical formulations and validations using benchmark problems are discussed in this paper along with realistic examples.
Plasma formed ion beam projection lithography system

DOEpatents

Leung, Ka-Ngo; Lee, Yung-Hee Yvette; Ngo, Vinh; Zahir, Nastaran

2002-01-01

A plasma-formed ion-beam projection lithography (IPL) system eliminates the acceleration stage between the ion source and stencil mask of a conventional IPL system. Instead a much thicker mask is used as a beam forming or extraction electrode, positioned next to the plasma in the ion source. Thus the entire beam forming electrode or mask is illuminated uniformly with the source plasma. The extracted beam passes through an acceleration and reduction stage onto the resist coated wafer. Low energy ions, about 30 eV, pass through the mask, minimizing heating, scattering, and sputtering.
High-Order Entropy Stable Finite Difference Schemes for Nonlinear Conservation Laws: Finite Domains

NASA Technical Reports Server (NTRS)

Fisher, Travis C.; Carpenter, Mark H.

2013-01-01

Developing stable and robust high-order finite difference schemes requires mathematical formalism and appropriate methods of analysis. In this work, nonlinear entropy stability is used to derive provably stable high-order finite difference methods with formal boundary closures for conservation laws. Particular emphasis is placed on the entropy stability of the compressible Navier-Stokes equations. A newly derived entropy stable weighted essentially non-oscillatory finite difference method is used to simulate problems with shocks and a conservative, entropy stable, narrow-stencil finite difference approach is used to approximate viscous terms.
Adaptive parallel logic networks

NASA Technical Reports Server (NTRS)

Martinez, Tony R.; Vidal, Jacques J.

1988-01-01

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

DOE PAGES

Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...

2017-04-05

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
A wearable stimulation bandage for electrotherapy studies in a rat ischemic wound model.

PubMed

Howe, Daniel S; Dunning, Jeremy L; Henzel, Mary K; Graebert, Jennifer K; Bogie, Kath M

2011-01-01

The clinical efficacy of electro-therapy in the treatment of chronic wounds is currently debated, and a in-vivo evaluation of stimulation parameters will provide the statistical evidence needed to direct clinical guidelines. A low-cost, wearable electrical stimulation bandage has been developed for use with an established rat ischemic wound model. The bandage consists of a user-programmable stimulator PCB and a plastic bandage with two hydrogel electrodes. The battery-powered bandage may be used for up to seven days between dressing changes, and the stimulator may be reused. The microcontroller-based stimulator uses a boost converter circuit to generate pulses up to 90 V from a 3 V coin cell battery. Consistent operation of the boost converter over the wide input and output voltage ranges is achieved using voltage feedforward and soft-start techniques implemented in firmware. The bandages are laser-cut to shape, and electrical traces are applied using stencils and conductive nickel paint. Both the PCB and electrical traces are encapsulated to protect the animal. The device has been successfully demonstrated using the rat ischemic wound model for a period of seven days, and clinical experiments are ongoing.
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Basu, Protonu; Williams, Samuel; Van Straalen, Brian

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
46 CFR 111.12-7 - Voltage regulation and parallel operation.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 46 Shipping 4 2013-10-01 2013-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 46 Shipping 4 2014-10-01 2014-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 46 Shipping 4 2012-10-01 2012-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 46 Shipping 4 2011-10-01 2011-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Mine Hoist Operator Training System. Phase I Report.

DTIC Science & Technology

1978-11-01

Bodies of Knowledge Function Control speed of conveyances Hold conveyances in position Structural Components Types of brakes : * Disc * Drum - Jaw...Parallel motion Components of each type * Disc / drum * Pads/shoes * Operating mechanisms Operating mediums for braking * Hydraulic/pneumatic * Manual...SHAFT GUIDES Wood El BRAKES Steel Rails El Drum : Wire Rope: Jaw El Full Lock El Parallel Motion El Half Lock El Disc El LEVELS DRIVE MOTORS Single El
Experimental characterization of a binary actuated parallel manipulator

NASA Astrophysics Data System (ADS)

Giuseppe, Carbone

2016-05-01

This paper describes the BAPAMAN (Binary Actuated Parallel MANipulator) series of parallel manipulators that has been conceived at Laboratory of Robotics and Mechatronics (LARM). Basic common characteristics of BAPAMAN series are described. In particular, it is outlined the use of a reduced number of active degrees of freedom, the use of design solutions with flexural joints and Shape Memory Alloy (SMA) actuators for achieving miniaturization, cost reduction and easy operation features. Given the peculiarities of BAPAMAN architecture, specific experimental tests have been proposed and carried out with the aim to validate the proposed design and to evaluate the practical operation performance and the characteristics of a built prototype, in particular, in terms of operation and workspace characteristics.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.

Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Optimal expression evaluation for data parallel architectures

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.
Degradation Characterization of Thermal Interface Greases

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeVoto, Douglas J; Major, Joshua; Paret, Paul P

Thermal interface materials (TIMs) are used in power electronics packaging to minimize thermal resistance between the heat generating component and the heat sink. Thermal greases are one such class. The conformability and thin bond line thickness (BLT) of these TIMs can potentially provide low thermal resistance throughout the operation lifetime of a component. However, their performance degrades over time due to pump-out and dry-out during thermal and power cycling. The reliability performance of greases through operational cycling needs to be quantified to develop new materials with superior properties. NREL, in collaboration with DuPont, has performed thermal and reliability characterization ofmore » several commercially available thermal greases. Initial bulk and contact thermal resistance of grease samples were measured, and then the thermal degradation that occurred due to pump-out and dry-out during temperature cycling was monitored. The thermal resistances of five different grease materials were evaluated using NREL's steady-state thermal resistance tester based on the ASTM test method D5470. Greases were then applied, utilizing a 2.5 cm x 2.5 cm stencil, between invar and aluminum plates to compare the thermomechanical performance of the materials in a representative test fixture. Scanning Acoustic microscopy, thermal, and compositional analyses were performed periodically during thermal cycling from -40 degrees Celcius to 125 degrees Celcius. Completion of this characterization has allowed for a comprehensive evaluation of thermal greases both for their initial bulk and contact thermal performance, as well as their degradation mechanisms under accelerated thermal cycling conditions.« less
Degradation Characterization of Thermal Interface Greases: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeVoto, Douglas J; Major, Joshua; Paret, Paul P

Thermal interface materials (TIMs) are used in power electronics packaging to minimize thermal resistance between the heat generating component and the heat sink. Thermal greases are one such class. The conformability and thin bond line thickness (BLT) of these TIMs can potentially provide low thermal resistance throughout the operation lifetime of a component. However, their performance degrades over time due to pump-out and dry-out during thermal and power cycling. The reliability performance of greases through operational cycling needs to be quantified to develop new materials with superior properties. NREL, in collaboration with DuPont, has performed thermal and reliability characterization ofmore » several commercially available thermal greases. Initial bulk and contact thermal resistance of grease samples were measured, and then the thermal degradation that occurred due to pump-out and dry-out during temperature cycling was monitored. The thermal resistances of five different grease materials were evaluated using NREL's steady-state thermal resistance tester based on the ASTM test method D5470. Greases were then applied, utilizing a 2.5 cm x 2.5 cm stencil, between invar and aluminum plates to compare the thermomechanical performance of the materials in a representative test fixture. Scanning Acoustic microscopy, thermal, and compositional analyses were performed periodically during thermal cycling from -40 degrees Celcius to 125 degrees Celcius. Completion of this characterization has allowed for a comprehensive evaluation of thermal greases both for their initial bulk and contact thermal performance, as well as their degradation mechanisms under accelerated thermal cycling conditions.« less
Degradation Characterization of Thermal Interface Greases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Major, Joshua; Narumanchi, Sreekant V; Paret, Paul P

Thermal interface materials (TIMs) are used in power electronics packaging to minimize thermal resistance between the heat generating component and the heat sink. Thermal greases are one such class. The conformability and thin bond line thickness (BLT) of these TIMs can potentially provide low thermal resistance throughout the operation lifetime of a component. However, their performance degrades over time due to pump-out and dry-out during thermal and power cycling. The reliability performance of greases through operational cycling needs to be quantified to develop new materials with superior properties. NREL, in collaboration with DuPont, has performed thermal and reliability characterization ofmore » several commercially available thermal greases. Initial bulk and contact thermal resistance of grease samples were measured, and then the thermal degradation that occurred due to pump-out and dry-out during temperature cycling was monitored. The thermal resistances of five different grease materials were evaluated using NREL's steady-state thermal resistance tester based on the ASTM test method D5470. Greases were then applied, utilizing a 2.5 cm x 2.5 cm stencil, between invar and aluminum plates to compare the thermomechanical performance of the materials in a representative test fixture. Scanning Acoustic microscopy, thermal, and compositional analyses were performed periodically during thermal cycling from -40 degrees C to 125 degrees C. Completion of this characterization has allowed for a comprehensive evaluation of thermal greases both for their initial bulk and contact thermal performance, as well as their degradation mechanisms under accelerated thermal cycling conditions.« less
Multiple asynchronous stimulus- and task-dependent hierarchies (STDH) within the visual brain's parallel processing systems.

PubMed

Zeki, Semir

2016-10-01

Results from a variety of sources, some many years old, lead ineluctably to a re-appraisal of the twin strategies of hierarchical and parallel processing used by the brain to construct an image of the visual world. Contrary to common supposition, there are at least three 'feed-forward' anatomical hierarchies that reach the primary visual cortex (V1) and the specialized visual areas outside it, in parallel. These anatomical hierarchies do not conform to the temporal order with which visual signals reach the specialized visual areas through V1. Furthermore, neither the anatomical hierarchies nor the temporal order of activation through V1 predict the perceptual hierarchies. The latter shows that we see (and become aware of) different visual attributes at different times, with colour leading form (orientation) and directional visual motion, even though signals from fast-moving, high-contrast stimuli are among the earliest to reach the visual cortex (of area V5). Parallel processing, on the other hand, is much more ubiquitous than commonly supposed but is subject to a barely noticed but fundamental aspect of brain operations, namely that different parallel systems operate asynchronously with respect to each other and reach perceptual endpoints at different times. This re-assessment leads to the conclusion that the visual brain is constituted of multiple, parallel and asynchronously operating task- and stimulus-dependent hierarchies (STDH); which of these parallel anatomical hierarchies have temporal and perceptual precedence at any given moment is stimulus and task related, and dependent on the visual brain's ability to undertake multiple operations asynchronously. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Data for polarization in charmless B{yields}{phi}K*: A signal for new physics?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Das, Prasanta Kumar; Yang, K.-C.

2005-05-01

The recent observations of sizable transverse fractions of B{yields}{phi}K* may hint for the existence of new physics. We analyze all possible new-physics four-quark operators and find that two classes of new-physics operators could offer resolutions to the B{yields}{phi}K* polarization anomaly. The operators in the first class have structures (1-{gamma}{sub 5})x(1-{gamma}{sub 5}), {sigma}(1-{gamma}{sub 5})x{sigma}(1-{gamma}{sub 5}), and in the second class (1+{gamma}{sub 5})x(1+{gamma}{sub 5}), {sigma}(1+{gamma}{sub 5})x{sigma}(1+{gamma}{sub 5}). For each class, the new-physics effects can be lumped into a single parameter. Two possible experimental results of polarization phases, arg(A{sub perpendicular})-arg(A{sub parallel}){approx_equal}{pi} or 0, originating from the phase ambiguity in data, could be separatelymore » accounted for by our two new-physics scenarios: the first (second) scenario with the first (second) class new-physics operators. The consistency between the data and our new-physics analysis suggests a small new-physics weak phase, together with a large(r) strong phase. We obtain sizable transverse fractions {lambda}{sub parallel{sub parallel}}+{lambda}{sub perpendicular{sub perpendicular}}{approx_equal}{lambda}{sub 00}, in accordance with the observations. We find {lambda}{sub parallel{sub parallel}}{approx_equal}0.8{lambda}{sub perpendicular{sub perpendicular}} in the first scenario but {lambda}{sub parallel{sub parallel}} > or approx. {lambda}{sub perpendicular{sub perpendicular}} in the second scenario. We discuss the impact of the new-physics weak phase on observations.« less
Parallel computing techniques for rotorcraft aerodynamics

NASA Astrophysics Data System (ADS)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

DOE PAGES

Gamell, Marc; Teranishi, Keita; Kolla, Hemanth; ...

2017-10-26

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
Producing intricate IPMC shapes by means of spray-painting and printing (Conference Presentation)

NASA Astrophysics Data System (ADS)

Trabia, Sarah; Olsen, Zakai; Hwang, Taeseon; Kim, Kwang Jin

2017-04-01

Ionic Polymer-Metal Composites (IPMC) are common soft actuators that are Nafion® based and plated with a conductive metal, such as platinum, gold, or palladium. Nafion® is available in three forms: sheets, pellets, and water dispersion. Nafion® sheets can be cut to the desired dimensions and are best for rectangular IPMCs. However, the user is not able to change the thickness of these sheets by stacking and melting because Nafion® does not melt. A solution to this is Nafion® pellets, which can melt. These can be used for extrusion and injection molding. Though Nafion® pellets can be melted, they are difficult to work with, making the process quite challenging to master. The last form is Nafion® Water Dispersion, which can be used for casting. Casting can produce the desired thickness, but it does not solve the problem of achieving complex contours. The current methods of fabrication do not allow for complex shapes and structures. To solve this problem, two methods are presented: painting and printing. The painting method uses Nafion® Water Dispersion, an airbrush, and vinyl stencils. The stencils can be made into any shape with detailed edges. The printing method uses Nafion® pellets that are extruded into filaments and a commercially available 3D printer. The models are drawn in a Computer-Aided Drawing (CAD) program, such as SolidWorks. The produced Nafion® membranes will be compared with a commercial Nafion® membrane through a variety of tests, including Fourier Transform Infrared Spectroscopy, Scanning Electron Microscope, Thermogravimetric Analysis, Dynamic Mechanical Analysis, and Optical Microscope.
Comparison between iteration schemes for three-dimensional coordinate-transformed saturated-unsaturated flow model

NASA Astrophysics Data System (ADS)

An, Hyunuk; Ichikawa, Yutaka; Tachikawa, Yasuto; Shiiba, Michiharu

2012-11-01

SummaryThree different iteration methods for a three-dimensional coordinate-transformed saturated-unsaturated flow model are compared in this study. The Picard and Newton iteration methods are the common approaches for solving Richards' equation. The Picard method is simple to implement and cost-efficient (on an individual iteration basis). However it converges slower than the Newton method. On the other hand, although the Newton method converges faster, it is more complex to implement and consumes more CPU resources per iteration than the Picard method. The comparison of the two methods in finite-element model (FEM) for saturated-unsaturated flow has been well evaluated in previous studies. However, two iteration methods might exhibit different behavior in the coordinate-transformed finite-difference model (FDM). In addition, the Newton-Krylov method could be a suitable alternative for the coordinate-transformed FDM because it requires the evaluation of a 19-point stencil matrix. The formation of a 19-point stencil is quite a complex and laborious procedure. Instead, the Newton-Krylov method calculates the matrix-vector product, which can be easily approximated by calculating the differences of the original nonlinear function. In this respect, the Newton-Krylov method might be the most appropriate iteration method for coordinate-transformed FDM. However, this method involves the additional cost of taking an approximation at each Krylov iteration in the Newton-Krylov method. In this paper, we evaluated the efficiency and robustness of three iteration methods—the Picard, Newton, and Newton-Krylov methods—for simulating saturated-unsaturated flow through porous media using a three-dimensional coordinate-transformed FDM.
Steady and Unsteady Nozzle Simulations Using the Conservation Element and Solution Element Method

NASA Technical Reports Server (NTRS)

Friedlander, David Joshua; Wang, Xiao-Yen J.

2014-01-01

This paper presents results from computational fluid dynamic (CFD) simulations of a three-stream plug nozzle. Time-accurate, Euler, quasi-1D and 2D-axisymmetric simulations were performed as part of an effort to provide a CFD-based approach to modeling nozzle dynamics. The CFD code used for the simulations is based on the space-time Conservation Element and Solution Element (CESE) method. Steady-state results were validated using the Wind-US code and a code utilizing the MacCormack method while the unsteady results were partially validated via an aeroacoustic benchmark problem. The CESE steady-state flow field solutions showed excellent agreement with solutions derived from the other methods and codes while preliminary unsteady results for the three-stream plug nozzle are also shown. Additionally, a study was performed to explore the sensitivity of gross thrust computations to the control surface definition. The results showed that most of the sensitivity while computing the gross thrust is attributed to the control surface stencil resolution and choice of stencil end points and not to the control surface definition itself.Finally, comparisons between the quasi-1D and 2D-axisymetric solutions were performed in order to gain insight on whether a quasi-1D solution can capture the steady and unsteady nozzle phenomena without the cost of a 2D-axisymmetric simulation. Initial results show that while the quasi-1D solutions are similar to the 2D-axisymmetric solutions, the inability of the quasi-1D simulations to predict two dimensional phenomena limits its accuracy.
Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamell, Marc; Teranishi, Keita; Kolla, Hemanth

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
Improving operating room productivity via parallel anesthesia processing.

PubMed

Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R

2014-01-01

Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.
Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program

NASA Technical Reports Server (NTRS)

Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark

2014-01-01

The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.
Runtime optimization of an application executing on a parallel computer

DOEpatents

None

2014-11-25

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer

DOEpatents

Faraj, Daniel A; Smith, Brian E

2014-11-18

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer

DOEpatents

Faraj, Daniel A.; Smith, Brian E.

2013-01-29

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
PRAIS: Distributed, real-time knowledge-based systems made easy

NASA Technical Reports Server (NTRS)

Goldstein, David G.

1990-01-01

This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Air Traffic and Operational Data on Selected US Airports with Parallel Runways

NASA Technical Reports Server (NTRS)

Doyle, Thomas M.; McGee, Frank G.

1998-01-01

This report presents information on a number of airports in the country with parallel runways and focuses on those that have at least one pair of parallel runways closer than 4300 ft. Information contained in the report describes the airport's current operational activity as obtained through contact with the facility and from FAA air traffic tower activity data for FY 1997. The primary reason for this document is to provide a single source of information for research to determine airports where Airborne Information for Lateral Spacing (AILS) technology may be applicable.

An Overview of a Trajectory-Based Solution for En Route and Terminal Area Self-Spacing to Include Parallel Runway Operations

NASA Technical Reports Server (NTRS)

Abbott, Terence S.

2011-01-01

This paper presents an overview of an algorithm specifically designed to support NASA's Airborne Precision Spacing concept. This airborne self-spacing concept is trajectory-based, allowing for spacing operations prior to the aircraft being on a common path. This implementation provides the ability to manage spacing against two traffic aircraft, with one of these aircraft operating to a parallel dependent runway. Because this algorithm is trajectory-based, it also has the inherent ability to support required-time-of-arrival (RTA) operations
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Method and apparatus of parallel computing with simultaneously operating stream prefetching and list prefetching engines

DOEpatents

Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan

2012-12-11

A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Design of a dataway processor for a parallel image signal processing system

NASA Astrophysics Data System (ADS)

Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

1995-04-01

Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
The effect of cell design and test criteria on the series/parallel performance of nickel cadmium cells and batteries

NASA Technical Reports Server (NTRS)

Halpert, G.; Webb, D. A.

1983-01-01

Three batteries were operated in parallel from a common bus during charge and discharge. SMM utilized NASA Standard 20AH cells and batteries, and LANDSAT-D NASA 50AH cells and batteries of a similar design. Each battery consisted of 22 series connected cells providing the nominal 28V bus. The three batteries were charged in parallel using the voltage limit/current taper mode wherein the voltage limit was temperature compensated. Discharge occurred on the demand of the spacecraft instruments and electronics. Both flights were planned for three to five year missions. The series/parallel configuration of cells and batteries for the 3-5 yr mission required a well controlled product with built-in reliability and uniformity. Examples of how component, cell and battery selection methods affect the uniformity of the series/parallel operation of the batteries both in testing and in flight are given.
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Automated Long-Term Monitoring of Parallel Microfluidic Operations Applying a Machine Vision-Assisted Positioning Method

PubMed Central

Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu

2014-01-01

As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
The cognitive architecture for chaining of two mental operations.

PubMed

Sackur, Jérôme; Dehaene, Stanislas

2009-05-01

A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study the cognitive architecture for chained operations with an elementary arithmetic algorithm: we required participants to add (or subtract) two to a digit, and then compare the result with five. In four experiments, we probed the internal implementation of this task with chronometric analysis, the cued-response method, the priming method, and a subliminal forced-choice procedure. We found evidence for an approximately sequential processing, with an important qualification: the second operation in the algorithm appears to start before completion of the first operation. Furthermore, initially the second operation takes as input the stimulus number rather than the output of the first operation. Thus, operations that should be processed serially are in fact executed partially in parallel. Furthermore, although each elementary operation can proceed subliminally, their chaining does not occur in the absence of conscious perception. Overall, the results suggest that chaining is slow, effortful, imperfect (resulting partly in parallel rather than serial execution) and dependent on conscious control.
Simplex-stochastic collocation method with improved scalability

NASA Astrophysics Data System (ADS)

Edeling, W. N.; Dwight, R. P.; Cinnella, P.

2016-04-01

The Simplex-Stochastic Collocation (SSC) method is a robust tool used to propagate uncertain input distributions through a computer code. However, it becomes prohibitively expensive for problems with dimensions higher than 5. The main purpose of this paper is to identify bottlenecks, and to improve upon this bad scalability. In order to do so, we propose an alternative interpolation stencil technique based upon the Set-Covering problem, and we integrate the SSC method in the High-Dimensional Model-Reduction framework. In addition, we address the issue of ill-conditioned sample matrices, and we present an analytical map to facilitate uniformly-distributed simplex sampling.
Development of an optical parallel logic device and a half-adder circuit for digital optical processing

NASA Technical Reports Server (NTRS)

Athale, R. A.; Lee, S. H.

1978-01-01

The paper describes the fabrication and operation of an optical parallel logic (OPAL) device which performs Boolean algebraic operations on binary images. Several logic operations on two input binary images were demonstrated using an 8 x 8 device with a CdS photoconductor and a twisted nematic liquid crystal. Two such OPAL devices can be interconnected to form a half-adder circuit which is one of the essential components of a CPU in a digital signal processor.
20 kHz main inverter unit. [for space station power supplies

NASA Technical Reports Server (NTRS)

Hussey, S.

1989-01-01

A proof-of-concept main inverter unit has demonstrated the operation of a pulse-width-modulated parallel resonant power stage topology as a 20-kHz ac power source driver, showing simple output regulation, parallel operation, power sharing and short-circuit operation. The use of a two-stage dc input filter controls the electromagnetic compatibility (EMC) characteristics of the dc power bus, and the use of an ac harmonic trap controls the EMC characteristics of the 20-kHz ac power bus.
Research of the effectiveness of parallel multithreaded realizations of interpolation methods for scaling raster images

NASA Astrophysics Data System (ADS)

Vnukov, A. A.; Shershnev, M. B.

2018-01-01

The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Parallel multiphase microflows: fundamental physics, stabilization methods and applications.

PubMed

Aota, Arata; Mawatari, Kazuma; Kitamori, Takehiko

2009-09-07

Parallel multiphase microflows, which can integrate unit operations in a microchip under continuous flow conditions, are discussed. Fundamental physics, stabilization methods and some applications are shown.
Electronic scraps--recovering of valuable materials from parallel wire cables.

PubMed

de Araújo, Mishene Christie Pinheiro Bezerra; Chaves, Arthur Pinto; Espinosa, Denise Crocce Romano; Tenório, Jorge Alberto Soares

2008-11-01

Every year, the number of discarded electro-electronic products is increasing. For this reason recycling is needed, to avoid wasting non-renewable natural resources. The objective of this work is to study the recycling of materials from parallel wire cable through unit operations of mineral processing. Parallel wire cables are basically composed of polymer and copper. The following unit operations were tested: grinding, size classification, dense medium separation, electrostatic separation, scrubbing, panning, and elutriation. It was observed that the operations used obtained copper and PVC concentrates with a low degree of cross contamination. It was concluded that total liberation of the materials was accomplished after grinding to less than 3 mm, using a cage mill. Separation using panning and elutriation presented the best results in terms of recovery and cross contamination.
Monolithic Parallel Tandem Organic Photovoltaic Cell with Transparent Carbon Nanotube Interlayer

NASA Technical Reports Server (NTRS)

Tanaka, S.; Mielczarek, K.; Ovalle-Robles, R.; Wang, B.; Hsu, D.; Zakhidov, A. A.

2009-01-01

We demonstrate an organic photovoltaic cell with a monolithic tandem structure in parallel connection. Transparent multiwalled carbon nanotube sheets are used as an interlayer anode electrode for this parallel tandem. The characteristics of front and back cells are measured independently. The short circuit current density of the parallel tandem cell is larger than the currents of each individual cell. The wavelength dependence of photocurrent for the parallel tandem cell shows the superposition spectrum of the two spectral sensitivities of the front and back cells. The monolithic three-electrode photovoltaic cell indeed operates as a parallel tandem with improved efficiency.
Application of a Scalable, Parallel, Unstructured-Grid-Based Navier-Stokes Solver

NASA Technical Reports Server (NTRS)

Parikh, Paresh

2001-01-01

A parallel version of an unstructured-grid based Navier-Stokes solver, USM3Dns, previously developed for efficient operation on a variety of parallel computers, has been enhanced to incorporate upgrades made to the serial version. The resultant parallel code has been extensively tested on a variety of problems of aerospace interest and on two sets of parallel computers to understand and document its characteristics. An innovative grid renumbering construct and use of non-blocking communication are shown to produce superlinear computing performance. Preliminary results from parallelization of a recently introduced "porous surface" boundary condition are also presented.
Design of a massively parallel computer using bit serial processing elements

NASA Technical Reports Server (NTRS)

Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

1995-01-01

A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.

Comparison between four dissimilar solar panel configurations

NASA Astrophysics Data System (ADS)

Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.

2017-12-01

Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2014-05-01

Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
EOS: A project to investigate the design and construction of real-time distributed Embedded Operating Systems

NASA Technical Reports Server (NTRS)

Campbell, R. H.; Essick, Ray B.; Johnston, Gary; Kenny, Kevin; Russo, Vince

1987-01-01

Project EOS is studying the problems of building adaptable real-time embedded operating systems for the scientific missions of NASA. Choices (A Class Hierarchical Open Interface for Custom Embedded Systems) is an operating system designed and built by Project EOS to address the following specific issues: the software architecture for adaptable embedded parallel operating systems, the achievement of high-performance and real-time operation, the simplification of interprocess communications, the isolation of operating system mechanisms from one another, and the separation of mechanisms from policy decisions. Choices is written in C++ and runs on a ten processor Encore Multimax. The system is intended for use in constructing specialized computer applications and research on advanced operating system features including fault tolerance and parallelism.
Low-Cost 3D Printing Orbital Implant Templates in Secondary Orbital Reconstructions.

PubMed

Callahan, Alison B; Campbell, Ashley A; Petris, Carisa; Kazim, Michael

Despite its increasing use in craniofacial reconstructions, three-dimensional (3D) printing of customized orbital implants has not been widely adopted. Limitations include the cost of 3D printers able to print in a biocompatible material suitable for implantation in the orbit and the breadth of available implant materials. The authors report the technique of low-cost 3D printing of orbital implant templates used in complex, often secondary, orbital reconstructions. A retrospective case series of 5 orbital reconstructions utilizing a technique of 3D printed orbital implant templates is presented. Each patient's Digital Imaging and Communications in Medicine data were uploaded and processed to create 3D renderings upon which a customized implant was designed and sent electronically to printers open for student use at our affiliated institutions. The mock implants were sterilized and used intraoperatively as a stencil and mold. The final implant material was chosen by the surgeons based on the requirements of the case. Five orbital reconstructions were performed with this technique: 3 tumor reconstructions and 2 orbital fractures. Four of the 5 cases were secondary reconstructions. Molded Medpor Titan (Stryker, Kalamazoo, MI) implants were used in 4 cases and titanium mesh in 1 case. The stenciled and molded implants were adjusted no more than 2 times before anchored in place (mean 1). No case underwent further revision. The technique and cases presented demonstrate 1) the feasibility and accessibility of low-cost, independent use of 3D printing technology to fashion patient-specific implants in orbital reconstructions, 2) the ability to apply this technology to the surgeon's preference of any routinely implantable material, and 3) the utility of this technique in complex, secondary reconstructions.
Positivity-preserving dual time stepping schemes for gas dynamics

NASA Astrophysics Data System (ADS)

Parent, Bernard

2018-05-01

A new approach at discretizing the temporal derivative of the Euler equations is here presented which can be used with dual time stepping. The temporal discretization stencil is derived along the lines of the Cauchy-Kowalevski procedure resulting in cross differences in spacetime but with some novel modifications which ensure the positivity of the discretization coefficients. It is then shown that the so-obtained spacetime cross differences result in changes to the wave speeds and can thus be incorporated within Roe or Steger-Warming schemes (with and without reconstruction-evolution) simply by altering the eigenvalues. The proposed approach is advantaged over alternatives in that it is positivity-preserving for the Euler equations. Further, it yields monotone solutions near discontinuities while exhibiting a truncation error in smooth regions less than the one of the second- or third-order accurate backward-difference-formula (BDF) for either small or large time steps. The high resolution and positivity preservation of the proposed discretization stencils are independent of the convergence acceleration technique which can be set to multigrid, preconditioning, Jacobian-free Newton-Krylov, block-implicit, etc. Thus, the current paper also offers the first implicit integration of the time-accurate Euler equations that is positivity-preserving in the strict sense (that is, the density and temperature are guaranteed to remain positive). This is in contrast to all previous positivity-preserving implicit methods which only guaranteed the positivity of the density, not of the temperature or pressure. Several stringent reacting and inert test cases confirm the positivity-preserving property of the proposed method as well as its higher resolution and higher computational efficiency over other second-order and third-order implicit temporal discretization strategies.
Point-of-need simultaneous electrochemical detection of lead and cadmium using low-cost stencil-printed transparency electrodes.

PubMed

Martín-Yerga, Daniel; Álvarez-Martos, Isabel; Blanco-López, M Carmen; Henry, Charles S; Fernández-Abedul, M Teresa

2017-08-15

In this work, we report a simple and yet efficient stencil-printed electrochemical platform that can be integrated into the caps of sample containers and thus, allows in-field quantification of Cd(II) and Pb(II) in river water samples. The device exploits the low-cost features of carbon (as electrode material) and paper/polyester transparency sheets (as substrate). Electrochemical analysis of the working electrodes prepared on different substrates (polyester transparency sheets, chromatographic, tracing and office papers) with hexaammineruthenium(III) showed that their electroactive area and electron transfer kinetics are highly affected by the porosity of the material. Electrodes prepared on transparency substrates showed the best electroanalytical performance for the simultaneous determination of Cd(II) and Pb(II) by square-wave anodic stripping voltammetry. Interestingly, the temperature and time at which the carbon ink was cured had significant effect on the electrochemical response, especially the capacitive current. The amount of Cd and Pb on the electrode surface can be increased about 20% by in situ electrodeposition of bismuth. The electrochemical platform showed a linear range comprised between 1 and 200 μg/L for both metals, sensitivity of analysis of 0.22 and 0.087 μA/ppb and limits of detection of 0.2 and 0.3 μg/L for Cd(II) and Pb(II), respectively. The analysis of river water samples was done directly in the container where the sample was collected, which simplifies the procedure and approaches field analysis. The developed point-of-need detection system allowed simultaneous determination of Cd(II) and Pb(II) in those samples using the standard addition method with precise and accurate results. Copyright © 2017 Elsevier B.V. All rights reserved.
Two-Phase Contiguous Supported Lipid Bilayer Model for Membrane Rafts via Polymer Blotting and Stenciling.

PubMed

Richards, Mark J; Daniel, Susan

2017-02-07

The supported lipid bilayer has been portrayed as a useful model of the cell membrane compatible with many biophysical tools and techniques that demonstrate its appeal in learning about the basic features of the plasma membrane. However, some of its potential has yet to be realized, particularly in the area of bilayer patterning and phase/composition heterogeneity. In this work, we generate contiguous bilayer patterns as a model system that captures the general features of membrane domains and lipid rafts. Micropatterned polymer templates of two types are investigated for generating patterned bilayer formation: polymer blotting and polymer lift-off stenciling. While these approaches have been used previously to create bilayer arrays by corralling bilayers patches with various types of boundaries impenetrable to bilayer diffusion, unique to the methods presented here, there are no physical barriers to diffusion. In this work, interfaces between contiguous lipid phases define the pattern shapes, with continuity between them allowing transfer of membrane-bound biomolecules between the phases. We examine effectors of membrane domain stability including temperature and cholesterol content to investigate domain dynamics. Contiguous patterning of supported bilayers as a model of lipid rafts expands the application of the SLB to an area with current appeal and brings with it a useful toolset for characterization and analysis. These combined tools should be helpful to researchers investigating lipid raft dynamics and function and biomolecule partitioning studies. Additionally, this patterning technique may be useful for applications such as bioseparations that exploit differences in lipid phase partitioning or creation of membranes that bind species like viruses preferentially at lipid phase boundaries, to name a few.
Numerical Simulations of Hypersonic Boundary Layer Transition

NASA Astrophysics Data System (ADS)

Bartkowicz, Matthew David

Numerical schemes for supersonic flows tend to use large amounts of artificial viscosity for stability. This tends to damp out the small scale structures in the flow. Recently some low-dissipation methods have been proposed which selectively eliminate the artificial viscosity in regions which do not require it. This work builds upon the low-dissipation method of Subbareddy and Candler which uses the flux vector splitting method of Steger and Warming but identifies the dissipation portion to eliminate it. Computing accurate fluxes typically relies on large grid stencils or coupled linear systems that become computationally expensive to solve. Unstructured grids allow for CFD solutions to be obtained on complex geometries, unfortunately, it then becomes difficult to create a large stencil or the coupled linear system. Accurate solutions require grids that quickly become too large to be feasible. In this thesis a method is proposed to obtain more accurate solutions using relatively local data, making it suitable for unstructured grids composed of hexahedral elements. Fluxes are reconstructed using local gradients to extend the range of data used. The method is then validated on several test problems. Simulations of boundary layer transition are then performed. An elliptic cone at Mach 8 is simulated based on an experiment at the Princeton Gasdynamics Laboratory. A simulated acoustic noise boundary condition is imposed to model the noisy conditions of the wind tunnel and the transitioning boundary layer observed. A computation of an isolated roughness element is done based on an experiment in Purdue's Mach 6 quiet wind tunnel. The mechanism for transition is identified as an instability in the upstream separation region and a comparison is made to experimental data. In the CFD a fully turbulent boundary layer is observed downstream.
An adaptive discretization of incompressible flow using a multitude of moving Cartesian grids

NASA Astrophysics Data System (ADS)

English, R. Elliot; Qiu, Linhai; Yu, Yue; Fedkiw, Ronald

2013-12-01

We present a novel method for discretizing the incompressible Navier-Stokes equations on a multitude of moving and overlapping Cartesian grids each with an independently chosen cell size to address adaptivity. Advection is handled with first and second order accurate semi-Lagrangian schemes in order to alleviate any time step restriction associated with small grid cell sizes. Likewise, an implicit temporal discretization is used for the parabolic terms including Navier-Stokes viscosity which we address separately through the development of a method for solving the heat diffusion equations. The most intricate aspect of any such discretization is the method used in order to solve the elliptic equation for the Navier-Stokes pressure or that resulting from the temporal discretization of parabolic terms. We address this by first removing any degrees of freedom which duplicately cover spatial regions due to overlapping grids, and then providing a discretization for the remaining degrees of freedom adjacent to these regions. We observe that a robust second order accurate symmetric positive definite readily preconditioned discretization can be obtained by constructing a local Voronoi region on the fly for each degree of freedom in question in order to obtain both its stencil (logically connected neighbors) and stencil weights. Internal curved boundaries such as at solid interfaces are handled using a simple immersed boundary approach which is directly applied to the Voronoi mesh in both the viscosity and pressure solves. We independently demonstrate each aspect of our approach on test problems in order to show efficacy and convergence before finally addressing a number of common test cases for incompressible flow with stationary and moving solid bodies.
Tracking moving radar targets with parallel, velocity-tuned filters

DOEpatents

Bickel, Douglas L.; Harmony, David W.; Bielek, Timothy P.; Hollowell, Jeff A.; Murray, Margaret S.; Martinez, Ana

2013-04-30

Radar data associated with radar illumination of a movable target is processed to monitor motion of the target. A plurality of filter operations are performed in parallel on the radar data so that each filter operation produces target image information. The filter operations are defined to have respectively corresponding velocity ranges that differ from one another. The target image information produced by one of the filter operations represents the target more accurately than the target image information produced by the remainder of the filter operations when a current velocity of the target is within the velocity range associated with the one filter operation. In response to the current velocity of the target being within the velocity range associated with the one filter operation, motion of the target is tracked based on the target image information produced by the one filter operation.
Relaxation Revisited: A Fresh Look at Multigrid for Steady Flows

NASA Technical Reports Server (NTRS)

Roberts, Thomas W.; Swanson, R. C.; Sidilkover, David

1997-01-01

The year 1971 saw the publication of one of the landmark papers in computational aerodynamics, that of Murman and Cole. As with many seminal works, its significance lies not so much in the specific problem that it addressed| small disturbance, plane transonic flow - but in the identification of a general approach to the solution of a technically important and theoretically difficult problem. The key features of Murman and Cole's work were the use of type- dependent differencing to correctly account for the proper domain of dependence of a mixed elliptic/hyperbolic equation, and the introduction of line relaxation to solve the steady flow equation. All subsequent work in transonic potential flows was based on these concepts. Jameson extended Murman and Cole's ideas to the full potential equation with two important contributions. First, he introduced the rotated difference stencil, which generalized the Murman and Cole type-dependent difference operator to general coordinates. Second, he used the interpretation, introduced by Garabedian, of relaxation as an iteration in artificial time to construct stable relaxation schemes, generalizing the original line relaxation method of Reference. The decade of the 1970s saw an explosion of activity in the solution of transonic potential flows, which has been summarized in the review article of Caughey.
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

PubMed

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators.

PubMed

Zhao, Jing; Zong, Haili

2018-01-01

In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.
Multibus-based parallel processor for simulation

NASA Technical Reports Server (NTRS)

Ogrady, E. P.; Wang, C.-H.

1983-01-01

A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Evaluation of fault-tolerant parallel-processor architectures over long space missions

NASA Technical Reports Server (NTRS)

Johnson, Sally C.

1989-01-01

The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Fast parallel molecular algorithms for DNA-based computation: factoring integers.

PubMed

Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui

2005-06-01

The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Dispatching packets on a global combining network of a parallel computer

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN

2011-07-19

Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.

Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Role of the Controller in an Integrated Pilot-Controller Study for Parallel Approaches

NASA Technical Reports Server (NTRS)

Verma, Savvy; Kozon, Thomas; Ballinger, Debbi; Lozito, Sandra; Subramanian, Shobana

2011-01-01

Closely spaced parallel runway operations have been found to increase capacity within the National Airspace System but poor visibility conditions reduce the use of these operations [1]. Previous research examined the concepts and procedures related to parallel runways [2][4][5]. However, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot s and controller s procedures and information requirements for creating aircraft pairs for closely spaced parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was 12 nmi from the runway threshold. In this paper, the role of the controller, as examined in an integrated study of controllers and pilots, is presented. The controllers utilized a pairing scheduler and new pairing interfaces to help create and maintain aircraft pairs, in a high-fidelity, human-in-the loop simulation experiment. Results show that the controllers worked as a team to achieve pairing between aircraft and the level of inter-controller coordination increased when the aircraft in the pair belonged to different sectors. Controller feedback did not reveal over reliance on the automation nor complacency with the pairing automation or pairing procedures.
Plasmon-enhanced scattering and charge transfer in few-layer graphene interacting with buried printed 2D-pattern of silver nanoparticles

NASA Astrophysics Data System (ADS)

Carles, R.; Bayle, M.; Bonafos, C.

2018-04-01

Hybrid structures combing silver nanoparticles and few-layer graphene have been synthetized by combining low-energy ion beam synthesis and stencil techniques. A single plane of metallic nanoparticles plays the role of an embedded plasmonic enhancer located in dedicated areas at a controlled nanometer distance from deposited graphene layers. Optical imaging, reflectance and Raman scattering mapping are used to measure the enhancement of electronic and vibrational properties of these layers. In particular electronic Raman scattering is shown as notably efficient to analyze the optical transfer of charge carriers between the systems and the presence of intrinsic and extrinsic defects.
A-posteriori error estimation for the finite point method with applications to compressible flow

NASA Astrophysics Data System (ADS)

Ortega, Enrique; Flores, Roberto; Oñate, Eugenio; Idelsohn, Sergio

2017-08-01

An a-posteriori error estimate with application to inviscid compressible flow problems is presented. The estimate is a surrogate measure of the discretization error, obtained from an approximation to the truncation terms of the governing equations. This approximation is calculated from the discrete nodal differential residuals using a reconstructed solution field on a modified stencil of points. Both the error estimation methodology and the flow solution scheme are implemented using the Finite Point Method, a meshless technique enabling higher-order approximations and reconstruction procedures on general unstructured discretizations. The performance of the proposed error indicator is studied and applications to adaptive grid refinement are presented.
Surface Engineering and Patterning Using Parylene for Biological Applications

PubMed Central

Tan, Christine P.; Craighead, Harold G.

2010-01-01

Parylene is a family of chemically vapour deposited polymer with material properties that are attractive for biomedicine and nanobiotechnology. Chemically inert parylene “peel-off” stencils have been demonstrated for micropatterning biomolecular arrays with high uniformity, precise spatial control down to nanoscale resolution. Such micropatterned surfaces are beneficial in engineering biosensors and biological microenvironments. A variety of substituted precursors enables direct coating of functionalised parylenes onto biomedical implants and microfluidics, providing a convenient method for designing biocompatible and bioactive surfaces. This article will review the emerging role and applications of parylene as a biomaterial for surface chemical modification and provide a future outlook.
Plasmon-enhanced scattering and charge transfer in few-layer graphene interacting with buried printed 2D-pattern of silver nanoparticles.

PubMed

Carles, R; Bayle, M; Bonafos, C

2018-04-27

Hybrid structures combing silver nanoparticles and few-layer graphene have been synthetized by combining low-energy ion beam synthesis and stencil techniques. A single plane of metallic nanoparticles plays the role of an embedded plasmonic enhancer located in dedicated areas at a controlled nanometer distance from deposited graphene layers. Optical imaging, reflectance and Raman scattering mapping are used to measure the enhancement of electronic and vibrational properties of these layers. In particular electronic Raman scattering is shown as notably efficient to analyze the optical transfer of charge carriers between the systems and the presence of intrinsic and extrinsic defects.
Determining collective barrier operation skew in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faraj, Daniel A.

2015-11-24

Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Labeled trees and the efficient computation of derivations

NASA Technical Reports Server (NTRS)

Grossman, Robert; Larson, Richard G.

1989-01-01

The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
Determining collective barrier operation skew in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faraj, Daniel A.

Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Parallel Tensor Compression for Large-Scale Scientific Data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth; Geveci, Berk

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Terry R

2012-01-01

This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

B. Hendrickson; T.G. Kolda

1998-09-01

A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp; Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610

Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realisticmore » biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.« less
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

NASA Technical Reports Server (NTRS)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.

Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Wake vortex effects on parallel runway operations

DOT National Transportation Integrated Search

2003-01-06

Aircraft wake vortex behavior in ground effect between two parallel runways at Frankfurt/Main International Airport was studied. The distance and time of vortex demise were examined as a function of crosswind, aircraft type, and a measure of atmosphe...
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Design considerations for parallel graphics libraries

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
National Centers for Environmental Prediction

Science.gov Websites

/ VISION | About EMC EMC > NAM > Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment Change Log Contacts
Terminal Area Procedures for Paired Runways

NASA Technical Reports Server (NTRS)

Lozito, Sandy

2011-01-01

Parallel Runway operations have been found to increase capacity within the National Airspace (NAS) however, poor visibility conditions reduce this capacity [1]. Much research has been conducted to examine the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s(+/- 10s error) at a coupling point that is about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: Two levels of flight deck automation (current-day flight deck automation, and a prototype future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Data showed that the operations in this study were found to be acceptable and safe. Workload when using the pairing procedures and tools was generally low for both controllers and pilots, and situation awareness (SA) was typically moderate to high. There were some differences based upon the display and automation conditions for the pilots. Future research should consider the refinement of the concepts and tools for pilot and controller displays and automation for parallel runway concepts.
Study on the water resources optimal operation based on riverbed wind erosion control in West Liaohe River plain

NASA Astrophysics Data System (ADS)

Wanguang, Sun; Chengzhen, Li; Baoshan, Fan

2018-06-01

Rivers are drying up most frequently in West Liaohe River plain and the bare river beds present fine sand belts on land. These sand belts, which yield a dust heavily in windy days, stress the local environment deeply as the riverbeds are eroded by wind. The optimal operation of water resources, thus, is one of the most important methods for preventing the wind erosion of riverbeds. In this paper, optimal operation model for water resources based on riverbed wind erosion control has been established, which contains objective function, constraints, and solution method. The objective function considers factors which include water volume diverted into reservoirs, river length and lower threshold of flow rate, etc. On the basis of ensuring the water requirement of each reservoir, the destruction of the vegetation in the riverbed by the frequent river flow is avoided. The multi core parallel solving method for optimal water resources operation in the West Liaohe River Plain is proposed, which the optimal solution is found by DPSA method under the POA framework and the parallel computing program is designed in Fork/Join mode. Based on the optimal operation results, the basic rules of water resources operation in the West Liaohe River Plain are summarized. Calculation results show that, on the basis of meeting the requirement of water volume of every reservoir, the frequency of reach river flow which from Taihekou to Talagan Water Diversion Project in the Xinkai River is reduced effectively. The speedup and parallel efficiency of parallel algorithm are 1.51 and 0.76 respectively, and the computing time is significantly decreased. The research results show in this paper can provide technical support for the prevention and control of riverbed wind erosion in the West Liaohe River plain.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

NASA Astrophysics Data System (ADS)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Balancing exploration, uncertainty and computational demands in many objective reservoir optimization

NASA Astrophysics Data System (ADS)

Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea

2017-11-01

Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Research on Parallel Three Phase PWM Converters base on RTDS

NASA Astrophysics Data System (ADS)

Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

2018-01-01

Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.
Ropes: Support for collective opertions among distributed threads

NASA Technical Reports Server (NTRS)

Haines, Matthew; Mehrotra, Piyush; Cronk, David

1995-01-01

Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. Recently, systems have been designed and implemented to support interprocessor communication between lightweight threads so that threads can be exploited in a distributed memory system. Their use, in this setting, has been largely restricted to supporting latency hiding techniques and functional parallelism within a single application. However, to execute data parallel codes independent of other threads in the system, collective operations and relative indexing among threads are required. This paper describes the design of ropes: a scoping mechanism for collective operations and relative indexing among threads. We present the design of ropes in the context of the Chant system, and provide performance results evaluating our initial design decisions.
Resonance-induced sensitivity enhancement method for conductivity sensors

NASA Technical Reports Server (NTRS)

Tai, Yu-Chong (Inventor); Shih, Chi-yuan (Inventor); Li, Wei (Inventor); Zheng, Siyang (Inventor)

2009-01-01

Methods and systems for improving the sensitivity of a variety of conductivity sensing devices, in particular capacitively-coupled contactless conductivity detectors. A parallel inductor is added to the conductivity sensor. The sensor with the parallel inductor is operated at a resonant frequency of the equivalent circuit model. At the resonant frequency, parasitic capacitances that are either in series or in parallel with the conductance (and possibly a series resistance) is substantially removed from the equivalent circuit, leaving a purely resistive impedance. An appreciably higher sensor sensitivity results. Experimental verification shows that sensitivity improvements of the order of 10,000-fold are possible. Examples of detecting particulates with high precision by application of the apparatus and methods of operation are described.
Overview of the DART project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berry, K.R.; Hansen, F.R.; Napolitano, L.M.

1992-01-01

DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
Overview of the DART project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berry, K.R.; Hansen, F.R.; Napolitano, L.M.

1992-01-01

DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
Image Processing Using a Parallel Architecture.

DTIC Science & Technology

1987-12-01

ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Multi-threading: A new dimension to massively parallel scientific computation

NASA Astrophysics Data System (ADS)

Nielsen, Ida M. B.; Janssen, Curtis L.

2000-06-01

Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks

DOEpatents

Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E.

2010-03-02

Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root.
Distributed intelligence for supervisory control

NASA Technical Reports Server (NTRS)

Wolfe, W. J.; Raney, S. D.

1987-01-01

Supervisory control systems must deal with various types of intelligence distributed throughout the layers of control. Typical layers are real-time servo control, off-line planning and reasoning subsystems and finally, the human operator. Design methodologies must account for the fact that the majority of the intelligence will reside with the human operator. Hierarchical decompositions and feedback loops as conceptual building blocks that provide a common ground for man-machine interaction are discussed. Examples of types of parallelism and parallel implementation on several classes of computer architecture are also discussed.

Cryogenic parallel, single phase flows: an analytical approach

NASA Astrophysics Data System (ADS)

Eichhorn, R.

2017-02-01

Managing the cryogenic flows inside a state-of-the-art accelerator cryomodule has become a demanding endeavour: In order to build highly efficient modules, all heat transfers are usually intercepted at various temperatures. For a multi-cavity module, operated at 1.8 K, this requires intercepts at 4 K and at 80 K at different locations with sometimes strongly varying heat loads which for simplicity reasons are operated in parallel. This contribution will describe an analytical approach, based on optimization theories.
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Terry R

2011-01-01

This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach

NASA Technical Reports Server (NTRS)

Mak, Victor W. K.

1986-01-01

Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
Evaluating Nextgen Closely Spaced Parallel Operations Concepts with Validated Human Performance Models: Flight Deck Guidelines

NASA Technical Reports Server (NTRS)

Hooey, Becky Lee; Gore, Brian Francis; Mahlstedt, Eric; Foyle, David C.

2013-01-01

The objectives of the current research were to develop valid human performance models (HPMs) of approach and land operations; use these models to evaluate the impact of NextGen Closely Spaced Parallel Operations (CSPO) on pilot performance; and draw conclusions regarding flight deck display design and pilot-ATC roles and responsibilities for NextGen CSPO concepts. This document presents guidelines and implications for flight deck display designs and candidate roles and responsibilities. A companion document (Gore, Hooey, Mahlstedt, & Foyle, 2013) provides complete scenario descriptions and results including predictions of pilot workload, visual attention and time to detect off-nominal events.
The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
The Data Transfer Kit: A geometric rendezvous-based tool for multiphysics data transfer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slattery, S. R.; Wilson, P. P. H.; Pawlowski, R. P.

2013-07-01

The Data Transfer Kit (DTK) is a software library designed to provide parallel data transfer services for arbitrary physics components based on the concept of geometric rendezvous. The rendezvous algorithm provides a means to geometrically correlate two geometric domains that may be arbitrarily decomposed in a parallel simulation. By repartitioning both domains such that they have the same geometric domain on each parallel process, efficient and load balanced search operations and data transfer can be performed at a desirable algorithmic time complexity with low communication overhead relative to other types of mapping algorithms. With the increased development efforts in multiphysicsmore » simulation and other multiple mesh and geometry problems, generating parallel topology maps for transferring fields and other data between geometric domains is a common operation. The algorithms used to generate parallel topology maps based on the concept of geometric rendezvous as implemented in DTK are described with an example using a conjugate heat transfer calculation and thermal coupling with a neutronics code. In addition, we provide the results of initial scaling studies performed on the Jaguar Cray XK6 system at Oak Ridge National Laboratory for a worse-case-scenario problem in terms of algorithmic complexity that shows good scaling on 0(1 x 104) cores for topology map generation and excellent scaling on 0(1 x 105) cores for the data transfer operation with meshes of O(1 x 109) elements. (authors)« less
A Domain Decomposition Parallelization of the Fast Marching Method

NASA Technical Reports Server (NTRS)

Herrmann, M.

2003-01-01

In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Synchronizing compute node time bases in a parallel computer

DOEpatents

Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

2015-01-27

Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Synchronizing compute node time bases in a parallel computer

DOEpatents

Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

2014-12-30

Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
FAST TRACK COMMUNICATION: Poly(methyl methacrylate)-palladium clusters nanocomposite formation by supersonic cluster beam deposition: a method for microstructured metallization of polymer surfaces

NASA Astrophysics Data System (ADS)

Ravagnan, Luca; Divitini, Giorgio; Rebasti, Sara; Marelli, Mattia; Piseri, Paolo; Milani, Paolo

2009-04-01

Nanocomposite films were fabricated by supersonic cluster beam deposition (SCBD) of palladium clusters on poly(methyl methacrylate) (PMMA) surfaces. The evolution of the electrical conductance with cluster coverage and microscopy analysis show that Pd clusters are implanted in the polymer and form a continuous layer extending for several tens of nanometres beneath the polymer surface. This allows the deposition, using stencil masks, of cluster-assembled Pd microstructures on PMMA showing a remarkably high adhesion compared with metallic films obtained by thermal evaporation. These results suggest that SCBD is a promising tool for the fabrication of metallic microstructures on flexible polymeric substrates.
Stretchable Dry Electrodes with Concentric Ring Geometry for Enhancing Spatial Resolution in Electrophysiology.

PubMed

Wang, Kaiping; Parekh, Udit; Pailla, Tejaswy; Garudadri, Harinath; Gilja, Vikash; Ng, Tse Nga

2017-10-01

The multichannel concentric-ring electrodes are stencil printed on stretchable elastomers modified to improve adhesion to skin and minimize motion artifacts for electrophysiological recordings of electroencephalography, electromyography, and electrocardiography. These dry electrodes with a poly(3,4-ethylenedioxythiophene) polystyrene sulfonate interface layer are optimized to show lower noise level than that of commercial gel disc electrodes. The concentric ring geometry enables Laplacian filtering to pinpoint the bioelectric potential source with spatial resolution determined by the ring distance. This work shows a new fabrication approach to integrate and create designs that enhance spatial resolution for high-quality electrophysiology monitoring devices. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Fine line structures of ceramic films formed by patterning of metalorganic precursors using photolithography and ion beams

NASA Astrophysics Data System (ADS)

Hung, L. S.; Zheng, L. R.

1992-05-01

Fine line structures of ceramic thin films were fabricated by patterning of metalorganic precursors using photolithography and ion beams. A trilevel structure was developed with an outer resist layer to transfer patterns, a silver delineated layer as an implantation mask, and a planar resist layer protecting the precursor film from chemical attacking and sputtering. Ion irradiation through the Ag stencil rendered metal carboxylates insoluble in 2-ethylhexanoic acid, permitting patterning of the precursor film with patterning features on micron scales. The potential of this technique was demonstrated in patterning of Bi2Sr2CaCu2O(8+x) and Pb(Zr(0.53)Ti(0.47) thin films.
Self-aligned grating couplers on template-stripped metal pyramids via nanostencil lithography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klemme, Daniel J.; Johnson, Timothy W.; Mohr, Daniel A.

2016-05-23

We combine nanostencil lithography and template stripping to create self-aligned patterns about the apex of ultrasmooth metal pyramids with high throughput. Three-dimensional patterns such as spiral and asymmetric linear gratings, which can couple incident light into a hot spot at the tip, are presented as examples of this fabrication method. Computer simulations demonstrate that spiral and linear diffraction grating patterns are both effective at coupling light to the tip. The self-aligned stencil lithography technique can be useful for integrating plasmonic couplers with sharp metallic tips for applications such as near-field optical spectroscopy, tip-based optical trapping, plasmonic sensing, and heat-assisted magneticmore » recording.« less
Operator Finds Control at His Fingertips.

ERIC Educational Resources Information Center

Goscicki, Edward

1979-01-01

Discussed are the advantages associated with the use of computer systems in wastewater treatment facilities. The system parallels plant organization and considers operations, maintenance, and plant management. (CS)
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Managing Algorithmic Skeleton Nesting Requirements in Realistic Image Processing Applications: The Case of the SKiPPER-II Parallel Programming Environment's Operating Model

NASA Astrophysics Data System (ADS)

Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel

2005-12-01

SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.
An efficient parallel-processing method for transposing large matrices in place.

PubMed

Portnoff, M R

1999-01-01

We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
Spectral factorization of wavefields and wave operators

NASA Astrophysics Data System (ADS)

Rickett, James Edward

Spectral factorization is the problem of finding a minimum-phase function with a given power spectrum. Minimum phase functions have the property that they are causal with a causal (stable) inverse. In this thesis, I factor multidimensional systems into their minimum-phase components. Helical boundary conditions resolve any ambiguities over causality, allowing me to factor multi-dimensional systems with conventional one-dimensional spectral factorization algorithms. In the first part, I factor passive seismic wavefields recorded in two-dimensional spatial arrays. The result provides an estimate of the acoustic impulse response of the medium that has higher bandwidth than autocorrelation-derived estimates. Also, the function's minimum-phase nature mimics the physics of the system better than the zero-phase autocorrelation model. I demonstrate this on helioseismic data recorded by the satellite-based Michelson Doppler Imager (MDI) instrument, and shallow seismic data recorded at Long Beach, California. In the second part of this thesis, I take advantage of the stable-inverse property of minimum-phase functions to solve wave-equation partial differential equations. By factoring multi-dimensional finite-difference stencils into minimum-phase components, I can invert them efficiently, facilitating rapid implicit extrapolation without the azimuthal anisotropy that is observed with splitting approximations. The final part of this thesis describes how to calculate diagonal weighting functions that approximate the combined operation of seismic modeling and migration. These weighting functions capture the effects of irregular subsurface illumination, which can be the result of either the surface-recording geometry, or focusing and defocusing of the seismic wavefield as it propagates through the earth. Since they are diagonal, they can be easily both factored and inverted to compensate for uneven subsurface illumination in migrated images. Experimental results show that applying these weighting functions after migration leads to significantly improved estimates of seismic reflectivity.

A finite-volume module for all-scale Earth-system modelling at ECMWF

NASA Astrophysics Data System (ADS)

Kühnlein, Christian; Malardel, Sylvie; Smolarkiewicz, Piotr

2017-04-01

We highlight recent advancements in the development of the finite-volume module (FVM) (Smolarkiewicz et al., 2016) for the IFS at ECMWF. FVM represents an alternative dynamical core that complements the operational spectral dynamical core of the IFS with new capabilities. Most notably, these include a compact-stencil finite-volume discretisation, flexible meshes, conservative non-oscillatory transport and all-scale governing equations. As a default, FVM solves the compressible Euler equations in a geospherical framework (Szmelter and Smolarkiewicz, 2010). The formulation incorporates a generalised terrain-following vertical coordinate. A hybrid computational mesh, fully unstructured in the horizontal and structured in the vertical, enables efficient global atmospheric modelling. Moreover, a centred two-time-level semi-implicit integration scheme is employed with 3D implicit treatment of acoustic, buoyant, and rotational modes. The associated 3D elliptic Helmholtz problem is solved using a preconditioned Generalised Conjugate Residual approach. The solution procedure employs the non-oscillatory finite-volume MPDATA advection scheme that is bespoke for the compressible dynamics on the hybrid mesh (Kühnlein and Smolarkiewicz, 2017). The recent progress of FVM is illustrated with results of benchmark simulations of intermediate complexity, and comparison to the operational spectral dynamical core of the IFS. C. Kühnlein, P.K. Smolarkiewicz: An unstructured-mesh finite-volume MPDATA for compressible atmospheric dynamics, J. Comput. Phys. (2017), in press. P.K. Smolarkiewicz, W. Deconinck, M. Hamrud, C. Kühnlein, G. Mozdzynski, J. Szmelter, N.P. Wedi: A finite-volume module for simulating global all-scale atmospheric flows, J. Comput. Phys. 314 (2016) 287-304. J. Szmelter, P.K. Smolarkiewicz: An edge-based unstructured mesh discretisation in geospherical framework, J. Comput. Phys. 229 (2010) 4980-4995.
Composing Data Parallel Code for a SPARQL Graph Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
HPCC Methodologies for Structural Design and Analysis on Parallel and Distributed Computing Platforms

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1998-01-01

In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.
A systematic approach to numerical dispersion in Maxwell solvers

NASA Astrophysics Data System (ADS)

Blinne, Alexander; Schinkel, David; Kuschel, Stephan; Elkina, Nina; Rykovanov, Sergey G.; Zepf, Matt

2018-03-01

The finite-difference time-domain (FDTD) method is a well established method for solving the time evolution of Maxwell's equations. Unfortunately the scheme introduces numerical dispersion and therefore phase and group velocities which deviate from the correct values. The solution to Maxwell's equations in more than one dimension results in non-physical predictions such as numerical dispersion or numerical Cherenkov radiation emitted by a relativistic electron beam propagating in vacuum. Improved solvers, which keep the staggered Yee-type grid for electric and magnetic fields, generally modify the spatial derivative operator in the Maxwell-Faraday equation by increasing the computational stencil. These modified solvers can be characterized by different sets of coefficients, leading to different dispersion properties. In this work we introduce a norm function to rewrite the choice of coefficients into a minimization problem. We solve this problem numerically and show that the minimization procedure leads to phase and group velocities that are considerably closer to c as compared to schemes with manually set coefficients available in the literature. Depending on a specific problem at hand (e.g. electron beam propagation in plasma, high-order harmonic generation from plasma surfaces, etc.), the norm function can be chosen accordingly, for example, to minimize the numerical dispersion in a certain given propagation direction. Particle-in-cell simulations of an electron beam propagating in vacuum using our solver are provided.
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Lin, C. T.

1989-01-01

The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

PubMed

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

NASA Astrophysics Data System (ADS)

Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

2016-10-01

A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PubMed Central

Ho, ThienLuan; Oh, Seung-Rohk

2017-01-01

Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Multitasking the three-dimensional transport code TORT on CRAY platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.

1996-04-01

The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
Determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation

DOEpatents

Blocksome, Michael A [Rochester, MN

2011-12-20

Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Learning and Parallelization Boost Constraint Search

ERIC Educational Resources Information Center

Yun, Xi

2013-01-01

Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Performance Improvements of the CYCOFOS Flow Model

NASA Astrophysics Data System (ADS)

Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.

2013-04-01

The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.
Global interrupt and barrier networks

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.

2008-10-28

A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A; Coteus, Paul W; Chen, Dong; Gara, Alan; Giampapa, Mark E; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd E; Steinmacher-Burow, Burkhard D; Vranas, Pavlos M

2014-01-07

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to the needs of a processing algorithm.

Airborne Precision Spacing (APS) Dependent Parallel Arrivals (DPA)

NASA Technical Reports Server (NTRS)

Smith, Colin L.

2012-01-01

The Airborne Precision Spacing (APS) team at the NASA Langley Research Center (LaRC) has been developing a concept of operations to extend the current APS concept to support dependent approaches to parallel or converging runways along with the required pilot and controller procedures and pilot interfaces. A staggered operations capability for the Airborne Spacing for Terminal Arrival Routes (ASTAR) tool was developed and designated as ASTAR10. ASTAR10 has reached a sufficient level of maturity to be validated and tested through a fast-time simulation. The purpose of the experiment was to identify and resolve any remaining issues in the ASTAR10 algorithm, as well as put the concept of operations through a practical test.
Parallel Transport Quantum Logic Gates with Trapped Ions.

PubMed

de Clercq, Ludwig E; Lo, Hsiang-Yu; Marinelli, Matteo; Nadlinger, David; Oswald, Robin; Negnevitsky, Vlad; Kienzler, Daniel; Keitch, Ben; Home, Jonathan P

2016-02-26

We demonstrate single-qubit operations by transporting a beryllium ion with a controlled velocity through a stationary laser beam. We use these to perform coherent sequences of quantum operations, and to perform parallel quantum logic gates on two ions in different processing zones of a multiplexed ion trap chip using a single recycled laser beam. For the latter, we demonstrate individually addressed single-qubit gates by local control of the speed of each ion. The fidelities we observe are consistent with operations performed using standard methods involving static ions and pulsed laser fields. This work therefore provides a path to scalable ion trap quantum computing with reduced requirements on the optical control complexity.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Coteus, Paul W [Yorktown Heights, NY; Chen, Dong [Croton On Hudson, NY; Gara, Alan [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Takken, Todd E [Brewster, NY; Steinmacher-Burow, Burkhard D [Wernau, DE; Vranas, Pavlos M [Bedford Hills, NY

2011-08-16

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
Parallel Digital Phase-Locked Loops

NASA Technical Reports Server (NTRS)

Sadr, Ramin; Shah, Biren N.; Hinedi, Sami M.

1995-01-01

Wide-band microwave receivers of proposed type include digital phase-locked loops in which band-pass filtering and down-conversion of input signals implemented by banks of multirate digital filters operating in parallel. Called "parallel digital phase-locked loops" to distinguish them from other digital phase-locked loops. Systems conceived as cost-effective solution to problem of filtering signals at high sampling rates needed to accommodate wide input frequency bands. Each of M filters process 1/M of spectrum of signal.
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Model-Based Systems Engineering in the Execution of Search and Rescue Operations

DTIC Science & Technology

2015-09-01

OSC can fulfill the duties of an ACO but it may make sense to split the duties if there are no communication links between the OSC and participating...parallel mode. This mode is the most powerful option because it 35 creates sequence diagrams that generate parallel “ swim lanes” for each asset...greater flexibility is desired, sequence mode generates diagrams based purely on sequential action and activity diagrams without the parallel “ swim lanes
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

PubMed

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation

DOEpatents

Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2009-11-06

Executing a scatter operation on a parallel computer includes: configuring a send buffer on a logical root, the send buffer having positions, each position corresponding to a ranked node in an operational group of compute nodes and for storing contents scattered to that ranked node; and repeatedly for each position in the send buffer: broadcasting, by the logical root to each of the other compute nodes on a global combining network, the contents of the current position of the send buffer using a bitwise OR operation, determining, by each compute node, whether the current position in the send buffer corresponds with the rank of that compute node, if the current position corresponds with the rank, receiving the contents and storing the contents in a reception buffer of that compute node, and if the current position does not correspond with the rank, discarding the contents.
Chaining direct memory access data transfer operations for compute nodes in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.

2010-09-28

Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-07-09

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-02-12

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
Reactanceless synthesized impedance bandpass amplifier

NASA Technical Reports Server (NTRS)

Kleinberg, L. L. (Inventor)

1985-01-01

An active R bandpass filter network is formed by four operational amplifier stages interconnected by discrete resistances. One pair of stages synthesize an equivalent input impedance of an inductance (L sub eq) in parallel with a discrete resistance (R sub o) while the second pair of stages synthesizes an equivalent input impedance of a capacitance (C sub eq) serially coupled to another discrete resistance (R sub i) coupled in parallel with the first two stages. The equivalent input impedances aggregately define a tuned resonant bandpass filter in the roll-off regions of the operational amplifiers.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, C.

Almost every computer architect dreams of achieving high system performance with low implementation costs. A multigauge machine can reconfigure its data-path width, provide parallelism, achieve better resource utilization, and sometimes can trade computational precision for increased speed. A simple experimental method is used here to capture the main characteristics of multigauging. The measurements indicate evidence of near-optimal speedups. Adapting these ideas in designing parallel processors incurs low costs and provides flexibility. Several operational aspects of designing a multigauge machine are discussed as well. Thus, this research reports the technical, economical, and operational feasibility studies of multigauging.
National Centers for Environmental Prediction

Science.gov Websites

Reference List Table of Contents NCEP OPERATIONAL MODEL FORECAST GRAPHICS PARALLEL/EXPERIMENTAL MODEL Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS VERIFICATION (GRID VS.OBS) WEB PAGE (NCEP EXPERIMENTAL PAGE, INTERNAL USE ONLY) Interactive web page tool for
Hardware packet pacing using a DMA in a parallel computer

DOEpatents

Chen, Dong; Heidelberger, Phillip; Vranas, Pavlos

2013-08-13

Method and system for hardware packet pacing using a direct memory access controller in a parallel computer which, in one aspect, keeps track of a total number of bytes put on the network as a result of a remote get operation, using a hardware token counter.
Comparison of Node-Centered and Cell-Centered Unstructured Finite-Volume Discretizations. Part 1; Viscous Fluxes

NASA Technical Reports Server (NTRS)

Diskin, Boris; Thomas, James L.; Nielsen, Eric J.; Nishikawa, Hiroaki; White, Jeffery A.

2009-01-01

Discretization of the viscous terms in current finite-volume unstructured-grid schemes are compared using node-centered and cell-centered approaches in two dimensions. Accuracy and efficiency are studied for six nominally second-order accurate schemes: a node-centered scheme, cell-centered node-averaging schemes with and without clipping, and cell-centered schemes with unweighted, weighted, and approximately mapped least-square face gradient reconstruction. The grids considered range from structured (regular) grids to irregular grids composed of arbitrary mixtures of triangles and quadrilaterals, including random perturbations of the grid points to bring out the worst possible behavior of the solution. Two classes of tests are considered. The first class of tests involves smooth manufactured solutions on both isotropic and highly anisotropic grids with discontinuous metrics, typical of those encountered in grid adaptation. The second class concerns solutions and grids varying strongly anisotropically over a curved body, typical of those encountered in high-Reynolds number turbulent flow simulations. Results from the first class indicate the face least-square methods, the node-averaging method without clipping, and the node-centered method demonstrate second-order convergence of discretization errors with very similar accuracies per degree of freedom. The second class of tests are more discriminating. The node-centered scheme is always second order with an accuracy and complexity in linearization comparable to the best of the cell-centered schemes. In comparison, the cell-centered node-averaging schemes are less accurate, have a higher complexity in linearization, and can fail to converge to the exact solution when clipping of the node-averaged values is used. The cell-centered schemes using least-square face gradient reconstruction have more compact stencils with a complexity similar to the complexity of the node-centered scheme. For simulations on highly anisotropic curved grids, the least-square methods have to be amended either by introducing a local mapping of the surface anisotropy or modifying the scheme stencil to reflect the direction of strong coupling.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

DOEpatents

Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.

2012-10-23

Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.
The characteristics and limitations of the MPS/MMS battery charging system

NASA Technical Reports Server (NTRS)

Ford, F. E.; Palandati, C. F.; Davis, J. F.; Tasevoli, C. M.

1980-01-01

A series of tests was conducted on two 12 ampere hour nickel cadmium batteries under a simulated cycle regime using the multiple voltage versus temperature levels designed into the modular power system (MPS). These tests included: battery recharge as a function of voltage control level; temperature imbalance between two parallel batteries; a shorted or partially shorted cell in one of the two parallel batteries; impedance imbalance of one of the parallel battery circuits; and disabling and enabling one of the batteries from the bus at various charge and discharge states. The results demonstrate that the eight commandable voltage versus temperature levels designed into the MPS provide a very flexible system that not only can accommodate a wide range of normal power system operation, but also provides a high degree of flexibility in responding to abnormal operating conditions.
Broadband piezoelectric energy harvesting devices using multiple bimorphs with different operating frequencies.

PubMed

Xue, Huan; Hu, Yuantai; Wang, Qing-Ming

2008-09-01

This paper presents a novel approach for designing broadband piezoelectric harvesters by integrating multiple piezoelectric bimorphs (PBs) with different aspect ratios into a system. The effect of 2 connecting patterns among PBs, in series and in parallel, on improving energy harvesting performance is discussed. It is found for multifrequency spectra ambient vibrations: 1) the operating frequency band (OFB) of a harvesting structure can be widened by connecting multiple PBs with different aspect ratios in series; 2) the OFB of a harvesting structure can be shifted to the dominant frequency domain of the ambient vibrations by increasing or decreasing the number of PBs in parallel. Numerical results show that the OFB of the piezoelectric energy harvesting devices can be tailored by the connection patterns (i.e., in series and in parallel) among PBs.

A rapid parallelization of cone-beam projection and back-projection operator based on texture fetching interpolation

NASA Astrophysics Data System (ADS)

Xie, Lizhe; Hu, Yining; Chen, Yang; Shi, Luyao

2015-03-01

Projection and back-projection are the most computational consuming parts in Computed Tomography (CT) reconstruction. Parallelization strategies using GPU computing techniques have been introduced. We in this paper present a new parallelization scheme for both projection and back-projection. The proposed method is based on CUDA technology carried out by NVIDIA Corporation. Instead of build complex model, we aimed on optimizing the existing algorithm and make it suitable for CUDA implementation so as to gain fast computation speed. Besides making use of texture fetching operation which helps gain faster interpolation speed, we fixed sampling numbers in the computation of projection, to ensure the synchronization of blocks and threads, thus prevents the latency caused by inconsistent computation complexity. Experiment results have proven the computational efficiency and imaging quality of the proposed method.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Organizing Compression of Hyperspectral Imagery to Allow Efficient Parallel Decompression

NASA Technical Reports Server (NTRS)

Klimesh, Matthew A.; Kiely, Aaron B.

2014-01-01

family of schemes has been devised for organizing the output of an algorithm for predictive data compression of hyperspectral imagery so as to allow efficient parallelization in both the compressor and decompressor. In these schemes, the compressor performs a number of iterations, during each of which a portion of the data is compressed via parallel threads operating on independent portions of the data. The general idea is that for each iteration it is predetermined how much compressed data will be produced from each thread.
Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Options for Parallelizing a Planning and Scheduling Algorithm

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
The Extended Parallel Process Model: Illuminating the Gaps in Research

ERIC Educational Resources Information Center

Popova, Lucy

2012-01-01

This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…
Quadruple parallel mass Spectrometry for analysis of vitamin D and triacylglycerols in a dietary supplement

USDA-ARS?s Scientific Manuscript database

A ‘dilute-and-shoot’ method for vitamin D and triacylglycerols is demonstrated that employed four mass spectrometers, operating in different ionization modes, for a ‘quadruple parallel mass spectrometry’ analysis, plus three other detectors, for seven detectors overall. Sets of five samples of diet...
Towards Energy-Performance Trade-off Analysis of Parallel Applications

ERIC Educational Resources Information Center

Korthikanti, Vijay Anand Reddy

2011-01-01

Energy consumption by computer systems has emerged as an important concern, both at the level of individual devices (limited battery capacity in mobile systems) and at the societal level (the production of Green House Gases). In parallel architectures, applications may be executed on a variable number of cores and these cores may operate at…
Proteus: a reconfigurable computational network for computer vision

NASA Astrophysics Data System (ADS)

Haralick, Robert M.; Somani, Arun K.; Wittenbrink, Craig M.; Johnson, Robert; Cooper, Kenneth; Shapiro, Linda G.; Phillips, Ihsin T.; Hwang, Jenq N.; Cheung, William; Yao, Yung H.; Chen, Chung-Ho; Yang, Larry; Daugherty, Brian; Lorbeski, Bob; Loving, Kent; Miller, Tom; Parkins, Larye; Soos, Steven L.

1992-04-01

The Proteus architecture is a highly parallel MIMD, multiple instruction, multiple-data machine, optimized for large granularity tasks such as machine vision and image processing The system can achieve 20 Giga-flops (80 Giga-flops peak). It accepts data via multiple serial links at a rate of up to 640 megabytes/second. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit switched Enhanced Hypercube serial interconnection network for internal data transfers. The system is designed to use 256 to 1,024 RISC processors. The processors use one megabyte external Read/Write Allocating Caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitate fault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, low and high level simulators, and a message passing system for all control needs. Image processing application software includes a variety of point operators neighborhood, operators, convolution, and the mathematical morphology operations of binary and gray scale dilation, erosion, opening, and closing.
Parallel-Processing Equalizers for Multi-Gbps Communications

NASA Technical Reports Server (NTRS)

Gray, Andrew; Ghuman, Parminder; Hoy, Scott; Satorius, Edgar H.

2004-01-01

Architectures have been proposed for the design of frequency-domain least-mean-square complex equalizers that would be integral parts of parallel- processing digital receivers of multi-gigahertz radio signals and other quadrature-phase-shift-keying (QPSK) or 16-quadrature-amplitude-modulation (16-QAM) of data signals at rates of multiple gigabits per second. Equalizers as used here denotes receiver subsystems that compensate for distortions in the phase and frequency responses of the broad-band radio-frequency channels typically used to convey such signals. The proposed architectures are suitable for realization in very-large-scale integrated (VLSI) circuitry and, in particular, complementary metal oxide semiconductor (CMOS) application- specific integrated circuits (ASICs) operating at frequencies lower than modulation symbol rates. A digital receiver of the type to which the proposed architecture applies (see Figure 1) would include an analog-to-digital converter (A/D) operating at a rate, fs, of 4 samples per symbol period. To obtain the high speed necessary for sampling, the A/D and a 1:16 demultiplexer immediately following it would be constructed as GaAs integrated circuits. The parallel-processing circuitry downstream of the demultiplexer, including a demodulator followed by an equalizer, would operate at a rate of only fs/16 (in other words, at 1/4 of the symbol rate). The output from the equalizer would be four parallel streams of in-phase (I) and quadrature (Q) samples.
Efficient Scalable Median Filtering Using Histogram-Based Operations.

PubMed

Green, Oded

2018-05-01

Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
A Parallel Biological Optimization Algorithm to Solve the Unbalanced Assignment Problem Based on DNA Molecular Computing.

PubMed

Wang, Zhaocai; Pu, Jun; Cao, Liling; Tan, Jian

2015-10-23

The unbalanced assignment problem (UAP) is to optimally resolve the problem of assigning n jobs to m individuals (m < n), such that minimum cost or maximum profit obtained. It is a vitally important Non-deterministic Polynomial (NP) complete problem in operation management and applied mathematics, having numerous real life applications. In this paper, we present a new parallel DNA algorithm for solving the unbalanced assignment problem using DNA molecular operations. We reasonably design flexible-length DNA strands representing different jobs and individuals, take appropriate steps, and get the solutions of the UAP in the proper length range and O(mn) time. We extend the application of DNA molecular operations and simultaneity to simplify the complexity of the computation.
Use of a residual distribution Euler solver to study the occurrence of transonic flow in Wells turbine rotor blades

NASA Astrophysics Data System (ADS)

Henriques, J. C. C.; Gato, L. M. C.

The aim of the present study is to investigate the occurrence of transonic flow in several cascade geometries and blade sections that have been considered in the design of Wells turbine rotor blades. The calculations were performed using an implicit Euler solver for two-dimensional flow. The numerical method uses a multi-dimensional upwind matrix residual distribution scheme formulated on a new symmetrized form of the Euler equations, both in time and in space, that decouples the entropy and the enthalpy equations. Second-order accurate steady-state solutions where obtained using a compact three-point stencil. The results show that unwanted transonic flow may occur in the turbine rotor at relatively low mean-flow Mach numbers.
Accurate finite difference methods for time-harmonic wave propagation

NASA Technical Reports Server (NTRS)

Harari, Isaac; Turkel, Eli

1994-01-01

Finite difference methods for solving problems of time-harmonic acoustics are developed and analyzed. Multidimensional inhomogeneous problems with variable, possibly discontinuous, coefficients are considered, accounting for the effects of employing nonuniform grids. A weighted-average representation is less sensitive to transition in wave resolution (due to variable wave numbers or nonuniform grids) than the standard pointwise representation. Further enhancement in method performance is obtained by basing the stencils on generalizations of Pade approximation, or generalized definitions of the derivative, reducing spurious dispersion, anisotropy and reflection, and by improving the representation of source terms. The resulting schemes have fourth-order accurate local truncation error on uniform grids and third order in the nonuniform case. Guidelines for discretization pertaining to grid orientation and resolution are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, K.; Petersson, N. A.; Rodgers, A.

Acoustic waveform modeling is a computationally intensive task and full three-dimensional simulations are often impractical for some geophysical applications such as long-range wave propagation and high-frequency sound simulation. In this study, we develop a two-dimensional high-order accurate finite-difference code for acoustic wave modeling. We solve the linearized Euler equations by discretizing them with the sixth order accurate finite difference stencils away from the boundary and the third order summation-by-parts (SBP) closure near the boundary. Non-planar topographic boundary is resolved by formulating the governing equation in curvilinear coordinates following the interface. We verify the implementation of the algorithm by numerical examplesmore » and demonstrate the capability of the proposed method for practical acoustic wave propagation problems in the atmosphere.« less
Multigrid Method for Modeling Multi-Dimensional Combustion with Detailed Chemistry

NASA Technical Reports Server (NTRS)

Zheng, Xiaoqing; Liu, Chaoqun; Liao, Changming; Liu, Zhining; McCormick, Steve

1996-01-01

A highly accurate and efficient numerical method is developed for modeling 3-D reacting flows with detailed chemistry. A contravariant velocity-based governing system is developed for general curvilinear coordinates to maintain simplicity of the continuity equation and compactness of the discretization stencil. A fully-implicit backward Euler technique and a third-order monotone upwind-biased scheme on a staggered grid are used for the respective temporal and spatial terms. An efficient semi-coarsening multigrid method based on line-distributive relaxation is used as the flow solver. The species equations are solved in a fully coupled way and the chemical reaction source terms are treated implicitly. Example results are shown for a 3-D gas turbine combustor with strong swirling inflows.

Extended bounds limiter for high-order finite-volume schemes on unstructured meshes

NASA Astrophysics Data System (ADS)

Tsoutsanis, Panagiotis

2018-06-01

This paper explores the impact of the definition of the bounds of the limiter proposed by Michalak and Ollivier-Gooch in [56] (2009), for higher-order Monotone-Upstream Central Scheme for Conservation Laws (MUSCL) numerical schemes on unstructured meshes in the finite-volume (FV) framework. A new modification of the limiter is proposed where the bounds are redefined by utilising all the spatial information provided by all the elements in the reconstruction stencil. Numerical results obtained on smooth and discontinuous test problems of the Euler equations on unstructured meshes, highlight that the newly proposed extended bounds limiter exhibits superior performance in terms of accuracy and mesh sensitivity compared to the cell-based or vertex-based bounds implementations.
Aircraft Configuration and Flight Crew Compliance with Procedures While Conducting Flight Deck Based Interval Management (FIM) Operations

NASA Technical Reports Server (NTRS)

Shay, Rick; Swieringa, Kurt A.; Baxley, Brian T.

2012-01-01

Flight deck based Interval Management (FIM) applications using ADS-B are being developed to improve both the safety and capacity of the National Airspace System (NAS). FIM is expected to improve the safety and efficiency of the NAS by giving pilots the technology and procedures to precisely achieve an interval behind the preceding aircraft by a specific point. Concurrently but independently, Optimized Profile Descents (OPD) are being developed to help reduce fuel consumption and noise, however, the range of speeds available when flying an OPD results in a decrease in the delivery precision of aircraft to the runway. This requires the addition of a spacing buffer between aircraft, reducing system throughput. FIM addresses this problem by providing pilots with speed guidance to achieve a precise interval behind another aircraft, even while flying optimized descents. The Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR) human-in-the-loop experiment employed 24 commercial pilots to explore the use of FIM equipment to conduct spacing operations behind two aircraft arriving to parallel runways, while flying an OPD during high-density operations. This paper describes the impact of variations in pilot operations; in particular configuring the aircraft, their compliance with FIM operating procedures, and their response to changes of the FIM speed. An example of the displayed FIM speeds used incorrectly by a pilot is also discussed. Finally, this paper examines the relationship between achieving airline operational goals for individual aircraft and the need for ATC to deliver aircraft to the runway with greater precision. The results show that aircraft can fly an OPD and conduct FIM operations to dependent parallel runways, enabling operational goals to be achieved efficiently while maintaining system throughput.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
High order parallel numerical schemes for solving incompressible flows

NASA Technical Reports Server (NTRS)

Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.

1992-01-01

The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, Michael S.; Strip, David R.

1996-01-01

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Research on battery-operated electric road vehicles

NASA Technical Reports Server (NTRS)

Varpetian, V. S.

1977-01-01

Mathematical analysis of battery-operated electric vehicles is presented. Attention is focused on assessing the influence of the battery on the mechanical and dynamical characteristics of dc electric motors with series and parallel excitation, as well as on evaluating the influence of the excitation mode and speed control system on the performance of the battery. The superiority of series excitation over parallel excitation with respect to vehicle performance is demonstrated. It is also shown that pulsed control of the electric motor, as compared to potentiometric control, provides a more effective use of the battery and decreases the cost of recharging.
Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix G. On the Design and Modeling of Special Purpose Parallel Processing Systems.

DTIC Science & Technology

1985-05-01

unit in the data base, with knowing one generic assembly language. °-’--a 139 The 5-tuple describing single operation execution time of the operations...TSi-- generate , random eventi ( ,.0-15 tieit tmls - ((floa egus ()16 274 r Ispt imet imel I at :EVE’JS- II ktime=0.0; /0 present time 0/ rrs ptime=0.0...computing machinery capable of performing these tasks within a given time constraint. Because the majority of the available computing machinery is general
Paging memory from random access memory to backing storage in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

2013-05-21

Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
10-Qubit Entanglement and Parallel Logic Operations with a Superconducting Circuit

NASA Astrophysics Data System (ADS)

Song, Chao; Xu, Kai; Liu, Wuxin; Yang, Chui-ping; Zheng, Shi-Biao; Deng, Hui; Xie, Qiwei; Huang, Keqiang; Guo, Qiujiang; Zhang, Libo; Zhang, Pengfei; Xu, Da; Zheng, Dongning; Zhu, Xiaobo; Wang, H.; Chen, Y.-A.; Lu, C.-Y.; Han, Siyuan; Pan, Jian-Wei

2017-11-01

Here we report on the production and tomography of genuinely entangled Greenberger-Horne-Zeilinger states with up to ten qubits connecting to a bus resonator in a superconducting circuit, where the resonator-mediated qubit-qubit interactions are used to controllably entangle multiple qubits and to operate on different pairs of qubits in parallel. The resulting 10-qubit density matrix is probed by quantum state tomography, with a fidelity of 0.668 ±0.025 . Our results demonstrate the largest entanglement created so far in solid-state architectures and pave the way to large-scale quantum computation.
10-Qubit Entanglement and Parallel Logic Operations with a Superconducting Circuit.

PubMed

Song, Chao; Xu, Kai; Liu, Wuxin; Yang, Chui-Ping; Zheng, Shi-Biao; Deng, Hui; Xie, Qiwei; Huang, Keqiang; Guo, Qiujiang; Zhang, Libo; Zhang, Pengfei; Xu, Da; Zheng, Dongning; Zhu, Xiaobo; Wang, H; Chen, Y-A; Lu, C-Y; Han, Siyuan; Pan, Jian-Wei

2017-11-03

Here we report on the production and tomography of genuinely entangled Greenberger-Horne-Zeilinger states with up to ten qubits connecting to a bus resonator in a superconducting circuit, where the resonator-mediated qubit-qubit interactions are used to controllably entangle multiple qubits and to operate on different pairs of qubits in parallel. The resulting 10-qubit density matrix is probed by quantum state tomography, with a fidelity of 0.668±0.025. Our results demonstrate the largest entanglement created so far in solid-state architectures and pave the way to large-scale quantum computation.
A Fast parallel tridiagonal algorithm for a class of CFD applications

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Sun, Xian-He

1996-01-01

The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.
Extended RF shimming: Sequence‐level parallel transmission optimization applied to steady‐state free precession MRI of the heart

PubMed Central

Price, Anthony N.; Padormo, Francesco; Hajnal, Joseph V.; Malik, Shaihan J.

2017-01-01

Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 +) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a ‘sequence‐level’ optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady‐state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight‐channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single‐channel operation, a mean‐squared‐error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. PMID:28195684
Extended RF shimming: Sequence-level parallel transmission optimization applied to steady-state free precession MRI of the heart.

PubMed

Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J

2017-06-01

Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm

NASA Astrophysics Data System (ADS)

Küchlin, Stephan; Jenny, Patrick

2018-06-01

Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
Quantum statistics and squeezing for a microwave-driven interacting magnon system.

PubMed

Haghshenasfard, Zahra; Cottam, Michael G

2017-02-01

Theoretical studies are reported for the statistical properties of a microwave-driven interacting magnon system. Both the magnetic dipole-dipole and the exchange interactions are included and the theory is developed for the case of parallel pumping allowing for the inclusion of the nonlinear processes due to the four-magnon interactions. The method of second quantization is used to transform the total Hamiltonian from spin operators to boson creation and annihilation operators. By using the coherent magnon state representation we have studied the magnon occupation number and the statistical behavior of the system. In particular, it is shown that the nonlinearities introduced by the parallel pumping field and the four-magnon interactions lead to non-classical quantum statistical properties of the system, such as magnon squeezing. Also control of the collapse-and-revival phenomena for the time evolution of the average magnon number is demonstrated by varying the parallel pumping amplitude and the four-magnon coupling.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
A method for real-time implementation of HOG feature extraction

NASA Astrophysics Data System (ADS)

Luo, Hai-bo; Yu, Xin-rong; Liu, Hong-mei; Ding, Qing-hai

2011-08-01

Histogram of oriented gradient (HOG) is an efficient feature extraction scheme, and HOG descriptors are feature descriptors which is widely used in computer vision and image processing for the purpose of biometrics, target tracking, automatic target detection(ATD) and automatic target recognition(ATR) etc. However, computation of HOG feature extraction is unsuitable for hardware implementation since it includes complicated operations. In this paper, the optimal design method and theory frame for real-time HOG feature extraction based on FPGA were proposed. The main principle is as follows: firstly, the parallel gradient computing unit circuit based on parallel pipeline structure was designed. Secondly, the calculation of arctangent and square root operation was simplified. Finally, a histogram generator based on parallel pipeline structure was designed to calculate the histogram of each sub-region. Experimental results showed that the HOG extraction can be implemented in a pixel period by these computing units.

A Concept for Airborne Precision Spacing for Dependent Parallel Approaches

NASA Technical Reports Server (NTRS)

Barmore, Bryan E.; Baxley, Brian T.; Abbott, Terence S.; Capron, William R.; Smith, Colin L.; Shay, Richard F.; Hubbs, Clay

2012-01-01

The Airborne Precision Spacing concept of operations has been previously developed to support the precise delivery of aircraft landing successively on the same runway. The high-precision and consistent delivery of inter-aircraft spacing allows for increased runway throughput and the use of energy-efficient arrivals routes such as Continuous Descent Arrivals and Optimized Profile Descents. This paper describes an extension to the Airborne Precision Spacing concept to enable dependent parallel approach operations where the spacing aircraft must manage their in-trail spacing from a leading aircraft on approach to the same runway and spacing from an aircraft on approach to a parallel runway. Functionality for supporting automation is discussed as well as procedures for pilots and controllers. An analysis is performed to identify the required information and a new ADS-B report is proposed to support these information needs. Finally, several scenarios are described in detail.
Conceptual design and kinematic analysis of a novel parallel robot for high-speed pick-and-place operations

NASA Astrophysics Data System (ADS)

Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun

2018-06-01

This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Rapid automated classification of anesthetic depth levels using GPU based parallelization of neural networks.

PubMed

Peker, Musa; Şen, Baha; Gürüler, Hüseyin

2015-02-01

The effect of anesthesia on the patient is referred to as depth of anesthesia. Rapid classification of appropriate depth level of anesthesia is a matter of great importance in surgical operations. Similarly, accelerating classification algorithms is important for the rapid solution of problems in the field of biomedical signal processing. However numerous, time-consuming mathematical operations are required when training and testing stages of the classification algorithms, especially in neural networks. In this study, to accelerate the process, parallel programming and computing platform (Nvidia CUDA) facilitates dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU) was utilized. The system was employed to detect anesthetic depth level on related electroencephalogram (EEG) data set. This dataset is rather complex and large. Moreover, the achieving more anesthetic levels with rapid response is critical in anesthesia. The proposed parallelization method yielded high accurate classification results in a faster time.
Institutional Computing Executive Group Review of Multi-programmatic & Institutional Computing, Fiscal Year 2005 and 2006

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langer, S; Rotman, D; Schwegler, E

The Institutional Computing Executive Group (ICEG) review of FY05-06 Multiprogrammatic and Institutional Computing (M and IC) activities is presented in the attached report. In summary, we find that the M and IC staff does an outstanding job of acquiring and supporting a wide range of institutional computing resources to meet the programmatic and scientific goals of LLNL. The responsiveness and high quality of support given to users and the programs investing in M and IC reflects the dedication and skill of the M and IC staff. M and IC has successfully managed serial capacity, parallel capacity, and capability computing resources.more » Serial capacity computing supports a wide range of scientific projects which require access to a few high performance processors within a shared memory computer. Parallel capacity computing supports scientific projects that require a moderate number of processors (up to roughly 1000) on a parallel computer. Capability computing supports parallel jobs that push the limits of simulation science. M and IC has worked closely with Stockpile Stewardship, and together they have made LLNL a premier institution for computational and simulation science. Such a standing is vital to the continued success of laboratory science programs and to the recruitment and retention of top scientists. This report provides recommendations to build on M and IC's accomplishments and improve simulation capabilities at LLNL. We recommend that institution fully fund (1) operation of the atlas cluster purchased in FY06 to support a few large projects; (2) operation of the thunder and zeus clusters to enable 'mid-range' parallel capacity simulations during normal operation and a limited number of large simulations during dedicated application time; (3) operation of the new yana cluster to support a wide range of serial capacity simulations; (4) improvements to the reliability and performance of the Lustre parallel file system; (5) support for the new GDO petabyte-class storage facility on the green network for use in data intensive external collaborations; and (6) continued support for visualization and other methods for analyzing large simulations. We also recommend that M and IC begin planning in FY07 for the next upgrade of its parallel clusters. LLNL investments in M and IC have resulted in a world-class simulation capability leading to innovative science. We thank the LLNL management for its continued support and thank the M and IC staff for its vision and dedicated efforts to make it all happen.« less
Self-pacing direct memory access data transfer operations for compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2015-02-17

Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.
An Advanced Framework for Improving Situational Awareness in Electric Power Grid Operation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Yousu; Huang, Zhenyu; Zhou, Ning

With the deployment of new smart grid technologies and the penetration of renewable energy in power systems, significant uncertainty and variability is being introduced into power grid operation. Traditionally, the Energy Management System (EMS) operates the power grid in a deterministic mode, and thus will not be sufficient for the future control center in a stochastic environment with faster dynamics. One of the main challenges is to improve situational awareness. This paper reviews the current status of power grid operation and presents a vision of improving wide-area situational awareness for a future control center. An advanced framework, consisting of parallelmore » state estimation, state prediction, parallel contingency selection, parallel contingency analysis, and advanced visual analytics, is proposed to provide capabilities needed for better decision support by utilizing high performance computing (HPC) techniques and advanced visual analytic techniques. Research results are presented to support the proposed vision and framework.« less
Signal processing applications of massively parallel charge domain computing devices

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

1999-01-01

The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.
User's guide to the Parallel Processing Extension of the Prognosis Model

Treesearch

Nicholas L. Crookston; Albert R. Stage

1991-01-01

The Parallel Processing Extension (PPE) of the Prognosis Model was designed to analyze responses of numerous stands to coordinated management and pest impacts that operate at the landscape level of forests. Vegetation-related resource supply analysis can be readily performed for a thousand or more sample stands for projections 400 years into the future. Capabilities...
Competitive Parallel Processing For Compression Of Data

NASA Technical Reports Server (NTRS)

Diner, Daniel B.; Fender, Antony R. H.

1990-01-01

Momentarily-best compression algorithm selected. Proposed competitive-parallel-processing system compresses data for transmission in channel of limited band-width. Likely application for compression lies in high-resolution, stereoscopic color-television broadcasting. Data from information-rich source like color-television camera compressed by several processors, each operating with different algorithm. Referee processor selects momentarily-best compressed output.
A C++ Thread Package for Concurrent and Parallel Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jie Chen; William Watson

1999-11-01

Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.
A comparison between orthogonal and parallel plating methods for distal humerus fractures: a prospective randomized trial.

PubMed

Lee, Sang Ki; Kim, Kap Jung; Park, Kyung Hoon; Choy, Won Sik

2014-10-01

With the continuing improvements in implants for distal humerus fractures, it is expected that newer types of plates, which are anatomically precontoured, thinner and less irritating to soft tissue, would have comparable outcomes when used in a clinical study. The purpose of this study was to compare the clinical and radiographic outcomes in patients with distal humerus fractures who were treated with orthogonal and parallel plating methods using precontoured distal humerus plates. Sixty-seven patients with a mean age of 55.4 years (range 22-90 years) were included in this prospective study. The subjects were randomly assigned to receive 1 of 2 treatments: orthogonal or parallel plating. The following results were assessed: operating time, time to fracture union, presence of a step or gap at the articular margin, varus-valgus angulation, functional recovery, and complications. No intergroup differences were observed based on radiological and clinical results between the groups. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes, mean operation time, union time, or complication rates. There were no cases of fracture nonunion in either group; heterotrophic ossification was found 3 patients in orthogonal plating group and 2 patients in parallel plating group. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes or complication rates. However, orthogonal plating method may be preferred in cases of coronal shear fractures, where posterior to anterior fixation may provide additional stability to the intraarticular fractures. Additionally, parallel plating method may be the preferred technique used for fractures that occur at the most distal end of the humerus.
Generalized kinetic-neoclassical closure for parallel viscosity in a tokamak.

NASA Astrophysics Data System (ADS)

Smolyakov, A.; Callen, J. D.; Hegna, C.

2000-10-01

We develop a drift-kinetic equation for a Chapman Enskog-type calculations of the parallel viscosity in a tokamak. This approach allows us to uniformly obtain closure relations for the parallel viscosity that include the kinetic effects of wave-particle interactions, such as those of Hammet-Perkins closures, as well as standard neoclassical moment closures induced by collisions and the magnetic field strength variation along field lines. Closures for both these cases can be obtained from our expressions; also, their mutual influences can be investigated. The developed equations allow calculation of parallel vicosity in general kinetic-neoclassical regimes while the main conservation properties remain correct even with an approximate treatment of the collisional operator.
Design and Performance of a 1 ms High-Speed Vision Chip with 3D-Stacked 140 GOPS Column-Parallel PEs †.

PubMed

Nose, Atsushi; Yamazaki, Tomohiro; Katayama, Hironobu; Uehara, Shuji; Kobayashi, Masatsugu; Shida, Sayaka; Odahara, Masaki; Takamiya, Kenichi; Matsumoto, Shizunori; Miyashita, Leo; Watanabe, Yoshihiro; Izawa, Takashi; Muramatsu, Yoshinori; Nitta, Yoshikazu; Ishikawa, Masatoshi

2018-04-24

We have developed a high-speed vision chip using 3D stacking technology to address the increasing demand for high-speed vision chips in diverse applications. The chip comprises a 1/3.2-inch, 1.27 Mpixel, 500 fps (0.31 Mpixel, 1000 fps, 2 × 2 binning) vision chip with 3D-stacked column-parallel Analog-to-Digital Converters (ADCs) and 140 Giga Operation per Second (GOPS) programmable Single Instruction Multiple Data (SIMD) column-parallel PEs for new sensing applications. The 3D-stacked structure and column parallel processing architecture achieve high sensitivity, high resolution, and high-accuracy object positioning.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Multiple resonant railgun power supply

DOEpatents

Honig, E.M.; Nunnally, W.C.

1985-06-19

A multiple repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. A plurality of serially connected paired parallel rails are powered by similar power supplies. Each supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles. The multiple serial operation permits relatively small energy components to deliver overall relatively large amounts of energy to the projectiles being propelled.
Execution of parallel algorithms on a heterogeneous multicomputer

NASA Astrophysics Data System (ADS)

Isenstein, Barry S.; Greene, Jonathon

1995-04-01

Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
Multiple resonant railgun power supply

DOEpatents

Honig, Emanuel M.; Nunnally, William C.

1988-01-01

A multiple repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. A plurality of serially connected paired parallel rails are powered by similar power supplies. Each supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles. The multiple serial operation permits relatively small energy components to deliver overall relatively large amounts of energy to the projectiles being propelled.
Interval Management with Spacing to Parallel Dependent Runways (IMSPIDR) Experiment and Results

NASA Technical Reports Server (NTRS)

Baxley, Brian T.; Swieringa, Kurt A.; Capron, William R.

2012-01-01

An area in aviation operations that may offer an increase in efficiency is the use of continuous descent arrivals (CDA), especially during dependent parallel runway operations. However, variations in aircraft descent angle and speed can cause inaccuracies in estimated time of arrival calculations, requiring an increase in the size of the buffer between aircraft. This in turn reduces airport throughput and limits the use of CDAs during high-density operations, particularly to dependent parallel runways. The Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR) concept uses a trajectory-based spacing tool onboard the aircraft to achieve by the runway an air traffic control assigned spacing interval behind the previous aircraft. This paper describes the first ever experiment and results of this concept at NASA Langley. Pilots flew CDAs to the Dallas Fort-Worth airport using airspeed calculations from the spacing tool to achieve either a Required Time of Arrival (RTA) or Interval Management (IM) spacing interval at the runway threshold. Results indicate flight crews were able to land aircraft on the runway with a mean of 2 seconds and less than 4 seconds standard deviation of the air traffic control assigned time, even in the presence of forecast wind error and large time delay. Statistically significant differences in delivery precision and number of speed changes as a function of stream position were observed, however, there was no trend to the difference and the error did not increase during the operation. Two areas the flight crew indicated as not acceptable included the additional number of speed changes required during the wind shear event, and issuing an IM clearance via data link while at low altitude. A number of refinements and future spacing algorithm capabilities were also identified.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.