layers running parallel: Topics by Science.gov

Sample records for layers running parallel

Armor structures

DOEpatents

Chu, Henry Shiu-Hung [Idaho Falls, ID; Lacy, Jeffrey M [Idaho Falls, ID

2008-04-01

An armor structure includes first and second layers individually containing a plurality of i-beams. Individual i-beams have a pair of longitudinal flanges interconnected by a longitudinal crosspiece and defining opposing longitudinal channels between the pair of flanges. The i-beams within individual of the first and second layers run parallel. The laterally outermost faces of the flanges of adjacent i-beams face one another. One of the longitudinal channels in each of the first and second layers faces one of the longitudinal channels in the other of the first and second layers. The channels of the first layer run parallel with the channels of the second layer. The flanges of the first and second layers overlap with the crosspieces of the other of the first and second layers, and portions of said flanges are received within the facing channels of the i-beams of the other of the first and second layers.
Epithelial innervation of human cornea: a three-dimensional study using confocal laser scanning fluorescence microscopy.

PubMed

Guthoff, Rudolf F; Wienss, Holger; Hahnel, Christian; Wree, Andreas

2005-07-01

Evaluation of a new method to visualize distribution and morphology of human corneal nerves (Adelta- and C-fibers) by means of fluorescence staining, confocal laser scanning microscopy, and 3-dimensional (3D) reconstruction. Trephinates of corneas with a diagnosis of Fuchs corneal dystrophy were sliced into layers of 200 microm thickness using a Draeger microkeratome (Storz, Germany). The anterior lamella was stained with the Life/Dead-Kit (Molecular Probes Inc.), examined by the confocal laser scanning microscope "Odyssey XL," step size between 0.5 and 1 microm, and optical sections were digitally 3D-reconstructed. Immediate staining of explanted corneas by the Life/Dead-Kit gave a complete picture of the nerves in the central human cornea. Thin nerves running parallel to the Bowman layer in the subepithelial plexus perforate the Bowman layer orthogonally through tube-like structures. Passing the Bowman layer, Adelta- and C-fibers can be clearly distinguished by fiber diameter, and, while running in the basal epithelial plexus, by their spatial arrangement. Adelta-fibers run straight and parallel to the Bowman layer underneath the basal cell layer. C-fibers, after a short run parallel to the Bowman layer, send off multiple branches penetrating epithelial cell layers orthogonally, ending blindly in invaginations of the superficial cells. In contrast to C-fibers, Adelta-fibers show characteristic bulbous formations when kinking into the basal epithelial plexus. Ex-vivo fluorescence staining of the cornea and 3D reconstructions of confocal scans provide a fast and easily reproducible tool to visualize nerves of the anterior living cornea at high resolution. This may help to clarify gross variations of nerve fiber patterns under various clinical and experimental conditions.
Development of Cranial Bone Surrogate Structures Using Stereolithographic Additive Manufacturing

DTIC Science & Technology

2017-09-29

shown in Fig. 5. With each cycle, a blade is passed across the platform to create a uniform layer of resin. The resin layer is exposed to a UV laser...due to the direction in which the layers are deposited. In both cases, the sequential layers run parallel to the loading direction of the tensile
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Wavelength-selective ultraviolet (Mg,Zn)O photodiodes: Tuning of parallel composition gradients with oxygen pressure

NASA Astrophysics Data System (ADS)

Zhang, Zhipeng; von Wenckstern, Holger; Lenzner, Jörg; Grundmann, Marius

2016-06-01

We report on ultraviolet photodiodes with integrated optical filter based on the wurtzite (Mg,Zn)O thin films. Tuning of the bandgap of filter and active layers was realized by employing a continuous composition spread approach relying on the ablation of a single segmented target in pulsed-laser deposition. Filter and active layers of the device were deposited on opposite sides of a sapphire substrate with nearly parallel compositional gradients. Ensure that for each sample position the bandgap of the filter layer blocking the high energy radiation is higher than that of the active layer. Different oxygen pressures during the two depositions runs. The absorption edge is tuned over 360 meV and the spectral bandwidth of photodiodes is typically 100 meV and as low as 50 meV.
Research in Parallel Computing: 1987-1990

DTIC Science & Technology

1994-08-05

emulation, we layered UNIX BSD 4.3 functionality above the kernel primitives, but packaged both as a monolithic unit running in privileged state. This...further, so that only a "pure kernel " or " microkernel " runs in privileged mode, while the other components of the environment execute as one or more client... kernel DTIC TAB 24 2.2.2 Nectar’s communication software Unannounced 0 25 2.2.3 A Nectar programming interface Justification 25 2.3 System evaluation 26
Real-time SHVC software decoding with multi-threaded parallel processing

NASA Astrophysics Data System (ADS)

Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

2014-09-01

This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Parallel Tracks as Quasi-steady States for the Magnetic Boundary Layers in Neutron-star Low-mass X-Ray Binaries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erkut, M. Hakan; Çatmabacak, Onur, E-mail: mherkut@gmail.com

The neutron stars in low-mass X-ray binaries (LMXBs) are usually thought to be weakly magnetized objects accreting matter from their low-mass companions in the form of a disk. Albeit weak compared to those in young neutron-star systems, the neutron-star magnetospheres in LMXBs can play an important role in determining the correlations between spectral and temporal properties. Parallel tracks appearing in the kilohertz (kHz) quasi-periodic oscillation (QPO) frequency versus X-ray flux plane can be used as a tool to study the magnetosphere–disk interaction in neutron-star LMXBs. For dynamically important weak fields, the formation of a non-Keplerian magnetic boundary layer at themore » innermost disk truncated near the surface of the neutron star is highly likely. Such a boundary region may harbor oscillatory modes of frequencies in the kHz range. We generate parallel tracks using the boundary region model of kHz QPOs. We also present the direct application of our model to the reproduction of the observed parallel tracks of individual sources such as 4U 1608–52, 4U 1636–53, and Aql X-1. We reveal how the radial width of the boundary layer must vary in the long-term flux evolution of each source to regenerate the parallel tracks. The run of the radial width looks similar for different sources and can be fitted by a generic model function describing the average steady behavior of the boundary region over the long term. The parallel tracks then correspond to the possible quasi-steady states the source can occupy around the average trend.« less
Parallel Tracks as Quasi-steady States for the Magnetic Boundary Layers in Neutron-star Low-mass X-Ray Binaries

NASA Astrophysics Data System (ADS)

Erkut, M. Hakan; Çatmabacak, Onur

2017-11-01

The neutron stars in low-mass X-ray binaries (LMXBs) are usually thought to be weakly magnetized objects accreting matter from their low-mass companions in the form of a disk. Albeit weak compared to those in young neutron-star systems, the neutron-star magnetospheres in LMXBs can play an important role in determining the correlations between spectral and temporal properties. Parallel tracks appearing in the kilohertz (kHz) quasi-periodic oscillation (QPO) frequency versus X-ray flux plane can be used as a tool to study the magnetosphere-disk interaction in neutron-star LMXBs. For dynamically important weak fields, the formation of a non-Keplerian magnetic boundary layer at the innermost disk truncated near the surface of the neutron star is highly likely. Such a boundary region may harbor oscillatory modes of frequencies in the kHz range. We generate parallel tracks using the boundary region model of kHz QPOs. We also present the direct application of our model to the reproduction of the observed parallel tracks of individual sources such as 4U 1608-52, 4U 1636-53, and Aql X-1. We reveal how the radial width of the boundary layer must vary in the long-term flux evolution of each source to regenerate the parallel tracks. The run of the radial width looks similar for different sources and can be fitted by a generic model function describing the average steady behavior of the boundary region over the long term. The parallel tracks then correspond to the possible quasi-steady states the source can occupy around the average trend.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Structure of a magnetic flux annihilation layer formed by the collision of supersonic, magnetized plasma flows

DOE PAGES

Suttle, L. G.; Hare, J. D.; Lebedev, S. V.; ...

2016-05-31

We present experiments characterizing the detailed structure of a current layer, generated by the collision of two counter-streaming, supersonic and magnetized aluminum plasma flows. The anti parallel magnetic fields advected by the flows are found to be mutually annihilated inside the layer, giving rise to a bifurcated current structure—two narrow current sheets running along the outside surfaces of the layer. Measurements with Thomson scattering show a fast outflow of plasma along the layer and a high ion temperature (T i~¯ZT e, with average ionization ¯Z=7). Lastly, analysis of the spatially resolved plasma parameters indicates that the advection and subsequent annihilationmore » of the in-flowing magnetic flux determines the structure of the layer, while the ion heating could be due to the development of kinetic, current-driven instabilities.« less
Structure of a magnetic flux annihilation layer formed by the collision of supersonic, magnetized plasma flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suttle, L. G.; Hare, J. D.; Lebedev, S. V.

We present experiments characterizing the detailed structure of a current layer, generated by the collision of two counter-streaming, supersonic and magnetized aluminum plasma flows. The anti parallel magnetic fields advected by the flows are found to be mutually annihilated inside the layer, giving rise to a bifurcated current structure—two narrow current sheets running along the outside surfaces of the layer. Measurements with Thomson scattering show a fast outflow of plasma along the layer and a high ion temperature (T i~¯ZT e, with average ionization ¯Z=7). Lastly, analysis of the spatially resolved plasma parameters indicates that the advection and subsequent annihilationmore » of the in-flowing magnetic flux determines the structure of the layer, while the ion heating could be due to the development of kinetic, current-driven instabilities.« less
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Durham extremely large telescope adaptive optics simulation platform.

PubMed

Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard

2007-03-01

Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Solid State Research, 1980:4

DTIC Science & Technology

1980-10-31

and is initiated at the periphery of the de- vice at opening in the SijNj layer. Rate measurement* of thi* prove** made on the GKOUSS imager using...dimensions, single-mode opera- tion can be obtained. There is a stripe opening in the oxide film running parallel to the etched rib, which can be...seen in cross section in Fig. I-l(a). This stripe opening is the nucleation region for the epitaxial growth. Other oxide-confined waveguide
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Application of Multiplexed Replica Exchange Molecular Dynamics to the UNRES Force Field: Tests with alpha and alpha+beta Proteins.

PubMed

Czaplewski, Cezary; Kalinowski, Sebastian; Liwo, Adam; Scheraga, Harold A

2009-03-10

The replica exchange (RE) method is increasingly used to improve sampling in molecular dynamics (MD) simulations of biomolecular systems. Recently, we implemented the united-residue UNRES force field for mesoscopic MD. Initial results from UNRES MD simulations show that we are able to simulate folding events that take place in a microsecond or even a millisecond time scale. To speed up the search further, we applied the multiplexing replica exchange molecular dynamics (MREMD) method. The multiplexed variant (MREMD) of the RE method, developed by Rhee and Pande, differs from the original RE method in that several trajectories are run at a given temperature. Each set of trajectories run at a different temperature constitutes a layer. Exchanges are attempted not only within a single layer but also between layers. The code has been parallelized and scales up to 4000 processors. We present a comparison of canonical MD, REMD, and MREMD simulations of protein folding with the UNRES force-field. We demonstrate that the multiplexed procedure increases the power of replica exchange MD considerably and convergence of the thermodynamic quantities is achieved much faster.
Application of Multiplexed Replica Exchange Molecular Dynamics to the UNRES Force Field: Tests with α and α+β Proteins

PubMed Central

Czaplewski, Cezary; Kalinowski, Sebastian; Liwo, Adam; Scheraga, Harold A.

2009-01-01

The replica exchange (RE) method is increasingly used to improve sampling in molecular dynamics (MD) simulations of biomolecular systems. Recently, we implemented the united-residue UNRES force field for mesoscopic MD. Initial results from UNRES MD simulations show that we are able to simulate folding events that take place in a microsecond or even a millisecond time scale. To speed up the search further, we applied the multiplexing replica exchange molecular dynamics (MREMD) method. The multiplexed variant (MREMD) of the RE method, developed by Rhee and Pande, differs from the original RE method in that several trajectories are run at a given temperature. Each set of trajectories run at a different temperature constitutes a layer. Exchanges are attempted not only within a single layer but also between layers. The code has been parallelized and scales up to 4000 processors. We present a comparison of canonical MD, REMD, and MREMD simulations of protein folding with the UNRES force-field. We demonstrate that the multiplexed procedure increases the power of replica exchange MD considerably and convergence of the thermodynamic quantities is achieved much faster. PMID:20161452
Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

NASA Astrophysics Data System (ADS)

Berg, Bernd A.; Wu, Hao

2012-10-01

We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11.5. Nature of problem: Physics of pure SU(3) Quantum Field Theory (QFT). This is relevant for our understanding of Quantum Chromodynamics (QCD). It includes the glueball spectrum, topological properties and the deconfining phase transition of pure SU(3) QFT. For instance, Relativistic Heavy Ion Collision (RHIC) experiments at the Brookhaven National Laboratory provide evidence that quarks confined in hadrons undergo at high enough temperature and pressure a transition into a Quark-Gluon Plasma (QGP). Investigations of its thermodynamics in pure SU(3) QFT are of interest. Solution method: Markov Chain Monte Carlo (MCMC) simulations of SU(3) Lattice Gauge Theory (LGT) with the Wilson action. This is a regularization of pure SU(3) QFT on a hypercubic lattice, which allows approaching the continuum SU(3) QFT by means of Finite Size Scaling (FSS) studies. Specifically, we provide updating routines for the Cabibbo-Marinari heatbath with and without checkerboard parallelization. While the first is suitable for pedagogical purposes and small scale projects, the latter allows for efficient parallel processing. Targetting the geometry of RHIC experiments, we have implemented a Double-Layered Torus (DLT) lattice geometry, which has previously not been used in LGT MCMC simulations and enables inside and outside layers at distinct temperatures, the lower-temperature layer acting as the outside boundary for the higher-temperature layer, where the deconfinement transition goes on. Restrictions: The checkerboard partition of the lattice makes the development of measurement programs more tedious than is the case for an unpartitioned lattice. Presently, only one measurement routine for Polyakov loops is provided. Unusual features: We provide three different versions for the send/receive function of the MPI library, which work for different operating system +compiler +MPI combinations. This involves activating the correct row in the last three rows of our latmpi.par parameter file. The underlying reason is distinct buffer conventions. Running time: For a typical run using an Intel i7 processor, it takes (1.8-6) E-06 seconds to update one link of the lattice, depending on the compiler used. For example, if we do a simulation on a small (4 * 83) DLT lattice with a statistics of 221 sweeps (i.e., update the two lattice layers of 4 * (4 * 83) links each 221 times), the total CPU time needed can be 2 * 4 * (4 * 83) * 221 * 3 E-06 seconds = 1.7 minutes, where 2 — two layers of lattice 4 — four dimensions 83 * 4 — lattice size 221 — sweeps of updating 6 E-06 s mdash; average time to update one link variable. If we divide the job into 8 parallel processes, then the real time is (for negligible communication overhead) 1.7 mins / 8 = 0.2 mins.

SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Spectacular Layers Exposed in Becquerel Crater

NASA Technical Reports Server (NTRS)

2001-01-01

Toward the end of its Primary Mapping Mission, the Mars Global Surveyor (MGS) Mars Orbiter Camera (MOC) acquired one of its most spectacular pictures of layered sedimentary rock exposed within the ancient crater Becquerel. Pictures such as this one from January 25, 2001, underscore the fact that you never know from one day to the next what the next MOC images will uncover. While the Primary Mission ends January 31, 2001, thousands of new pictures--revealing as-yet-unseen terrain on the red planet--may be obtained during the Extended Mission phase, scheduled to run through at least April 2002.
The picture shown here reveals hundreds of light-toned layers in the 167 kilometers (104 miles) wide basin named for 19th Century French physicist Antoine H. Becquerel (1852-1908). These layers are interpreted to be sedimentary rocks deposited in the crater at some time in the distant past. They have since been eroded and exposed, revealing faults, dark layers between the bright layers, and a long geologic history (of unknown duration) recorded in these materials. Sets of parallel faults can be seen cutting across the layers in the left third of the image. Sunlight illuminates this scene from the top/upper right.
Li0.5Al0.5Mg2(MoO4)3

PubMed Central

Ennajeh, Ines; Zid, Mohamed Faouzi; Driss, Ahmed

2013-01-01

The title compound, lithium/aluminium dimagnesium tetrakis[orthomolybdate(VI)], was prepared by a solid-state reaction route. The crystal structure is built up from MgO6 octahedra and MoO4 tetrahedra sharing corners and edges, forming two types of chains running along [100]. These chains are linked into layers parallel to (010) and finally linked by MoO4 tetrahedra into a three-dimensional framework structure with channels parallel to [001] in which lithium and aluminium cations equally occupy the same position within a distorted trigonal–bipyramidal coordination environment. The title structure is isotypic with LiMgIn(MoO4)3, with the In site becoming an Mg site and the fully occupied Li site a statistically occupied Li/Al site in the title structure. PMID:24426975
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
SCaLeM: A Framework for Characterizing and Analyzing Execution Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Manzano Franco, Joseph B.; Krishnamoorthy, Sriram

2014-10-13

As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributesmore » are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model’s primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.« less
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Effects of the diurnal cycle in solar radiation on the tropical Indian Ocean mixed layer variability during wintertime Madden-Julian Oscillations

NASA Astrophysics Data System (ADS)

Li, Yuanlong; Han, Weiqing; Shinoda, Toshiaki; Wang, Chunzai; Lien, Ren-Chieh; Moum, James N.; Wang, Jih-Wang

2013-10-01

The effects of solar radiation diurnal cycle on intraseasonal mixed layer variability in the tropical Indian Ocean during boreal wintertime Madden-Julian Oscillation (MJO) events are examined using the HYbrid Coordinate Ocean Model. Two parallel experiments, the main run and the experimental run, are performed for the period of 2005-2011 with daily atmospheric forcing except that an idealized hourly shortwave radiation diurnal cycle is included in the main run. The results show that the diurnal cycle of solar radiation generally warms the Indian Ocean sea surface temperature (SST) north of 10°S, particularly during the calm phase of the MJO when sea surface wind is weak, mixed layer is thin, and the SST diurnal cycle amplitude (dSST) is large. The diurnal cycle enhances the MJO-forced intraseasonal SST variability by about 20% in key regions like the Seychelles-Chagos Thermocline Ridge (SCTR; 55°-70°E, 12°-4°S) and the central equatorial Indian Ocean (CEIO; 65°-95°E, 3°S-3°N) primarily through nonlinear rectification. The model also well reproduced the upper-ocean variations monitored by the CINDY/DYNAMO field campaign between September-November 2011. During this period, dSST reaches 0.7°C in the CEIO region, and intraseasonal SST variability is significantly amplified. In the SCTR region where mean easterly winds are strong during this period, diurnal SST variation and its impact on intraseasonal ocean variability are much weaker. In both regions, the diurnal cycle also has a large impact on the upward surface turbulent heat flux QT and induces diurnal variation of QT with a peak-to-peak difference of O(10 W m-2).
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1991-01-01

Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
The Character and Formation of Elongated Depressions on the Upper Bulgarian Slope

NASA Astrophysics Data System (ADS)

Xu, Cuiling; Greinert, Jens; Haeckel, Matthias; Bialas, Jörg; Dimitrov, Lyubomir; Zhao, Guangtao

2018-06-01

Seafloor elongated depressions are indicators of gas seepage or slope instability. Here we report a sequence of slopeparallel elongated depressions that link to headwalls of sediment slides on upper slope. The depressions of about 250 m in width and several kilometers in length are areas of focused gas discharge indicated by bubble-release into the water column and methane enriched pore waters. Sparker seismic profiles running perpendicular and parallel to the coast, show gas migration pathways and trapped gas underneath these depressions with bright spots and seismic blanking. The data indicate that upward gas migration is the initial reason for fracturing sedimentary layers. In the top sediment where two young stages of landslides can be detected, the slopeparallel sediment weakening lengthens and deepens the surficial fractures, creating the elongated depressions in the seafloor supported by sediment erosion due to slope-parallel water currents.
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
View looking SW at brick retaining wall running parallel to ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

View looking SW at brick retaining wall running parallel to Jones Street showing bricked up storage vaults - Central of Georgia Railway, Savannah Repair Shops & Terminal Facilities, Brick Storage Vaults under Jones Street, Bounded by West Broad, Jones, West Boundary & Hull Streets, Savannah, Chatham County, GA
Turbulence modeling of free shear layers for high-performance aircraft

NASA Technical Reports Server (NTRS)

Sondak, Douglas L.

1993-01-01

The High Performance Aircraft (HPA) Grand Challenge of the High Performance Computing and Communications (HPCC) program involves the computation of the flow over a high performance aircraft. A variety of free shear layers, including mixing layers over cavities, impinging jets, blown flaps, and exhaust plumes, may be encountered in such flowfields. Since these free shear layers are usually turbulent, appropriate turbulence models must be utilized in computations in order to accurately simulate these flow features. The HPCC program is relying heavily on parallel computers. A Navier-Stokes solver (POVERFLOW) utilizing the Baldwin-Lomax algebraic turbulence model was developed and tested on a 128-node Intel iPSC/860. Algebraic turbulence models run very fast, and give good results for many flowfields. For complex flowfields such as those mentioned above, however, they are often inadequate. It was therefore deemed that a two-equation turbulence model will be required for the HPA computations. The k-epsilon two-equation turbulence model was implemented on the Intel iPSC/860. Both the Chien low-Reynolds-number model and a generalized wall-function formulation were included.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

NASA Astrophysics Data System (ADS)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Lower limb joint angles and ground reaction forces in forefoot strike and rearfoot strike runners during overground downhill and uphill running.

PubMed

Kowalski, Erik; Li, Jing Xian

2016-11-01

This study investigated the normal and parallel ground reaction forces during downhill and uphill running in habitual forefoot strike and habitual rearfoot strike (RFS) runners. Fifteen habitual forefoot strike and 15 habitual RFS recreational male runners ran at 3 m/s ± 5% during level, uphill and downhill overground running on a ramp mounted at 6° and 9°. Results showed that forefoot strike runners had no visible impact peak in all running conditions, while the impact peaks only decreased during the uphill conditions in RFS runners. Active peaks decreased during the downhill conditions in forefoot strike runners while active loading rates increased during downhill conditions in RFS runners. Compared to the level condition, parallel braking peaks were larger during downhill conditions and parallel propulsive peaks were larger during uphill conditions. Combined with previous biomechanics studies, our findings suggest that forefoot strike running may be an effective strategy to reduce impacts, especially during downhill running. These findings may have further implications towards injury management and prevention.
Parallel family trees for transfer matrices in the Potts model

NASA Astrophysics Data System (ADS)

Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

2015-02-01

The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aoki, Kenji

A read/write head for a magnetic tape includes an elongated chip assembly and a tape running surface formed in the longitudinal direction of the chip assembly. A pair of substantially spaced parallel read/write gap lines for supporting read/write elements extend longitudinally along the tape running surface of the chip assembly. Also, at least one groove is formed on the tape running surface on both sides of each of the read/write gap lines and extends substantially parallel to the read/write gap lines.

Final Scientific Report: A Scalable Development Environment for Peta-Scale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karbach, Carsten; Frings, Wolfgang

2013-02-22

This document is the final scientific report of the project DE-SC000120 (A scalable Development Environment for Peta-Scale Computing). The objective of this project is the extension of the Parallel Tools Platform (PTP) for applying it to peta-scale systems. PTP is an integrated development environment for parallel applications. It comprises code analysis, performance tuning, parallel debugging and system monitoring. The contribution of the Juelich Supercomputing Centre (JSC) aims to provide a scalable solution for system monitoring of supercomputers. This includes the development of a new communication protocol for exchanging status data between the target remote system and the client running PTP.more » The communication has to work for high latency. PTP needs to be implemented robustly and should hide the complexity of the supercomputer's architecture in order to provide a transparent access to various remote systems via a uniform user interface. This simplifies the porting of applications to different systems, because PTP functions as abstraction layer between parallel application developer and compute resources. The common requirement for all PTP components is that they have to interact with the remote supercomputer. E.g. applications are built remotely and performance tools are attached to job submissions and their output data resides on the remote system. Status data has to be collected by evaluating outputs of the remote job scheduler and the parallel debugger needs to control an application executed on the supercomputer. The challenge is to provide this functionality for peta-scale systems in real-time. The client server architecture of the established monitoring application LLview, developed by the JSC, can be applied to PTP's system monitoring. LLview provides a well-arranged overview of the supercomputer's current status. A set of statistics, a list of running and queued jobs as well as a node display mapping running jobs to their compute resources form the user display of LLview. These monitoring features have to be integrated into the development environment. Besides showing the current status PTP's monitoring also needs to allow for submitting and canceling user jobs. Monitoring peta-scale systems especially deals with presenting the large amount of status data in a useful manner. Users require to select arbitrary levels of detail. The monitoring views have to provide a quick overview of the system state, but also need to allow for zooming into specific parts of the system, into which the user is interested in. At present, the major batch systems running on supercomputers are PBS, TORQUE, ALPS and LoadLeveler, which have to be supported by both the monitoring and the job controlling component. Finally, PTP needs to be designed as generic as possible, so that it can be extended for future batch systems.« less
Collagen production of osteoblasts revealed by ultra-high voltage electron microscopy.

PubMed

Hosaki-Takamiya, Rumiko; Hashimoto, Mana; Imai, Yuichi; Nishida, Tomoki; Yamada, Naoko; Mori, Hirotaro; Tanaka, Tomoyo; Kawanabe, Noriaki; Yamashiro, Takashi; Kamioka, Hiroshi

2016-09-01

In the bone, collagen fibrils form a lamellar structure called the "twisted plywood-like model." Because of this unique structure, bone can withstand various mechanical stresses. However, the formation of this structure has not been elucidated because of the difficulty of observing the collagen fibril production of the osteoblasts via currently available methods. This is because the formation occurs in the very limited space between the osteoblast layer and bone matrix. In this study, we used ultra-high-voltage electron microscopy (UHVEM) to observe collagen fibril production three-dimensionally. UHVEM has 3-MV acceleration voltage and enables us to use thicker sections. We observed collagen fibrils that were beneath the cell membrane of osteoblasts elongated to the outside of the cell. We also observed that osteoblasts produced collagen fibrils with polarity. By using AVIZO software, we observed collagen fibrils produced by osteoblasts along the contour of the osteoblasts toward the bone matrix area. Immediately after being released from the cell, the fibrils run randomly and sparsely. But as they recede from the osteoblast, the fibrils began to run parallel to the definite direction and became thick, and we observed a periodical stripe at that area. Furthermore, we also observed membrane structures wrapped around filamentous structures inside the osteoblasts. The filamentous structures had densities similar to the collagen fibrils and a columnar form and diameter. Our results suggested that collagen fibrils run parallel and thickly, which may be related to the lateral movement of the osteoblasts. UHVEM is a powerful tool for observing collagen fibril production.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1990-01-01

Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
On a nonlinear state of the electromagnetic ion/ion cyclotron instability

NASA Astrophysics Data System (ADS)

Cremer, M.; Scholer, M.

We have investigated the nonlinear properties of the electromagnetic ion/ion cyclotron instability (EMIIC) by means of hybrid simulations (macroparticle ions, massless electron fluid). The instability is driven by the relative (super-Alfvénic) streaming of two field-aligned ion beams in a low beta plasma (ion thermal pressure to magnetic field pressure) and may be of importance in the plasma sheet boundary layer. As shown in previously reported simulations the waves propagate obliquely to the magnetic field and heat the ions in the perpendicular direction as the relative beam velocity decreases. By running the simulation to large times it can be shown that the large temperature anisotropy leads to the ion cyclotron instability (IC) with parallel propagating Alfvén ion cyclotron waves. This is confirmed by numerically solving the electromagnetic dispersion relation. An application of this property to the plasma sheet boundary layer is discussed.
Availability Improvement of Layer 2 Seamless Networks Using OpenFlow

PubMed Central

Molina, Elias; Jacob, Eduardo; Matias, Jon; Moreira, Naiara; Astarloa, Armando

2015-01-01

The network robustness and reliability are strongly influenced by the implementation of redundancy and its ability of reacting to changes. In situations where packet loss or maximum latency requirements are critical, replication of resources and information may become the optimal technique. To this end, the IEC 62439-3 Parallel Redundancy Protocol (PRP) provides seamless recovery in layer 2 networks by delegating the redundancy management to the end-nodes. In this paper, we present a combination of the Software-Defined Networking (SDN) approach and PRP topologies to establish a higher level of redundancy and thereby, through several active paths provisioned via the OpenFlow protocol, the global reliability is increased, as well as data flows are managed efficiently. Hence, the experiments with multiple failure scenarios, which have been run over the Mininet network emulator, show the improvement in the availability and responsiveness over other traditional technologies based on a single active path. PMID:25759861
Availability improvement of layer 2 seamless networks using OpenFlow.

PubMed

Molina, Elias; Jacob, Eduardo; Matias, Jon; Moreira, Naiara; Astarloa, Armando

2015-01-01

The network robustness and reliability are strongly influenced by the implementation of redundancy and its ability of reacting to changes. In situations where packet loss or maximum latency requirements are critical, replication of resources and information may become the optimal technique. To this end, the IEC 62439-3 Parallel Redundancy Protocol (PRP) provides seamless recovery in layer 2 networks by delegating the redundancy management to the end-nodes. In this paper, we present a combination of the Software-Defined Networking (SDN) approach and PRP topologies to establish a higher level of redundancy and thereby, through several active paths provisioned via the OpenFlow protocol, the global reliability is increased, as well as data flows are managed efficiently. Hence, the experiments with multiple failure scenarios, which have been run over the Mininet network emulator, show the improvement in the availability and responsiveness over other traditional technologies based on a single active path.
A Concurrent Implementation of the Cascade-Correlation Algorithm, Using the Time Warp Operating System

NASA Technical Reports Server (NTRS)

Springer, P.

1993-01-01

This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.
PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
SU-F-SPS-09: Parallel MC Kernel Calculations for VMAT Plan Improvement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chamberlain, S; Roswell Park Cancer Institute, Buffalo, NY; French, S

Purpose: Adding kernels (small perturbations in leaf positions) to the existing apertures of VMAT control points may improve plan quality. We investigate the calculation of kernel doses using a parallelized Monte Carlo (MC) method. Methods: A clinical prostate VMAT DICOM plan was exported from Eclipse. An arbitrary control point and leaf were chosen, and a modified MLC file was created, corresponding to the leaf position offset by 0.5cm. The additional dose produced by this 0.5 cm × 0.5 cm kernel was calculated using the DOSXYZnrc component module of BEAMnrc. A range of particle history counts were run (varying from 3more » × 10{sup 6} to 3 × 10{sup 7}); each job was split among 1, 10, or 100 parallel processes. A particle count of 3 × 10{sup 6} was established as the lower range because it provided the minimal accuracy level. Results: As expected, an increase in particle counts linearly increases run time. For the lowest particle count, the time varied from 30 hours for the single-processor run, to 0.30 hours for the 100-processor run. Conclusion: Parallel processing of MC calculations in the EGS framework significantly decreases time necessary for each kernel dose calculation. Particle counts lower than 1 × 10{sup 6} have too large of an error to output accurate dose for a Monte Carlo kernel calculation. Future work will investigate increasing the number of parallel processes and optimizing run times for multiple kernel calculations.« less
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Running Parallel Discrete Event Simulators on Sierra

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnes, P. D.; Jefferson, D. R.

2015-12-03

In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
Novel molecular targets for kRAS downregulation: promoter G-quadruplexes

DTIC Science & Technology

2016-11-01

conditions, and described the structure as having mixed parallel/anti-parallel loops of lengths 2:8:10 in the 5’-3’ direction. Using selective small...and anti-parallel loop directionality of lengths 4:10:8 in the 5’–3’ direction, three tetrads stacked, and involving guanines in runs B, C, E, and F...a tri-stacked structure incorporating runs B, C, E and F with intervening loops of 2, 10, and 8 bases in the 5’–3’ direction. G = black circles, C
Observation of layered antiferromagnetism in self-assembled parallel NiSi nanowire arrays on Si(110) by spin-polarized scanning tunneling spectromicroscopy

NASA Astrophysics Data System (ADS)

Hong, Ie-Hong; Hsu, Hsin-Zan

2018-03-01

The layered antiferromagnetism of parallel nanowire (NW) arrays self-assembled on Si(110) have been observed at room temperature by direct imaging of both the topographies and magnetic domains using spin-polarized scanning tunneling microscopy/spectroscopy (SP-STM/STS). The topographic STM images reveal that the self-assembled unidirectional and parallel NiSi NWs grow into the Si(110) substrate along the [\\bar{1}10] direction (i.e. the endotaxial growth) and exhibit multiple-layer growth. The spatially-resolved SP-STS maps show that these parallel NiSi NWs of different heights produce two opposite magnetic domains, depending on the heights of either even or odd layers in the layer stack of the NiSi NWs. This layer-wise antiferromagnetic structure can be attributed to an antiferromagnetic interlayer exchange coupling between the adjacent layers in the multiple-layer NiSi NW with a B2 (CsCl-type) crystal structure. Such an endotaxial heterostructure of parallel magnetic NiSi NW arrays with a layered antiferromagnetic ordering in Si(110) provides a new and important perspective for the development of novel Si-based spintronic nanodevices.
Unusual ZFC and FC magnetic behavior in thin Co multi-layered structure

NASA Astrophysics Data System (ADS)

Ben Dor, Oren; Yochelis, Shira; Felner, Israel; Paltiel, Yossi

2017-04-01

The observation of unusual magnetic phenomena in a Ni -based magnetic memory device ([4] O. Ben-Dor et al., 2013) encouraged us to conduct a systematic research on Co based multi-layered structure which contains a α-helix L polyalanine (AHPA-L) organic compound. The constant Co thickness is 7 nm and AHPA-L was also replaced by non-chiral 1-Decanethiol organic molecules. Both organic compounds were chemisorbed on gold by a thiol group. The dc magnetic field (H) was applied parallel and perpendicular to the surface layers. The perpendicular direction is the easy magnetization axis and along this orientation only, the zero-field-cooled (ZFC) plots exhibit a pronounced peak around 55-58 K. This peak is suppressed in the second ZFC and field-cooled (FC) runs performed shortly after the virgin ZFC one. Thus, around the peak position ZFC>FC a phenomenon seldom observed. This peak reappears after measuring the same material six months later. This behavior appears in layers with the non-chiral 1-Decanethiol and it is very similar to that obtained in sulfur doped amorphous carbon. The peak origin and the peculiar ZFC>FC case are qualitatively explained.
[Resistance to pressure of bronchial closures. Comparison of pressure resistance of manual and stapler bronchial closures depending on the angle to the cartilaginous rings].

PubMed

Ludwig, C; Behrend, M; Hoffarth, U; Schüttler, W; Stoelben, E

2004-09-01

This study was aimed to determine the resistance to pressure of manual and stapled bronchial closures under ideal conditions (90 degrees to the bronchial tree) and parallel to the trachea (45 degrees). An experimental study was done on 60 explanted pig tracheae which were alternatively closed with either double-layer, running sutures angled 90 degrees to the cartilaginous rings or an automatic stapling device. The closure line was placed exactly 90 degrees to the bronchial tree in 30 cases and parallel to the trachea (45 degrees) in 30. The sutures were placed under pressure until air leakage was observed. The leakage pressure was digitally recorded. A statistically significant difference existed between the two groups. Mechanical sutures proved more resistant to pressure (P=0.011). Under ideal conditions, the resistance to pressure of mechanical sutures is equal to if not better than that of manual sutures.
A Mixed-Valent Molybdenum Monophosphate with a Layer Structure: KMo 3P 2O 14

NASA Astrophysics Data System (ADS)

Guesdon, A.; Borel, M. M.; Leclaire, A.; Grandin, A.; Raveau, B.

1994-03-01

A new mixed-valent molybdenum monophosphate with a layer structure KMo 3P 2O 14 has been isolated. It crystallizes in the space group P2 1/ m with a = 8.599(2) Å, b = 6.392(2) Å, c = 10.602(1) Å, and β = 111.65(2)°. The layers [Mo 3P 2O 14] ∞ are parallel to (100) and consist of [MoPO 8] ∞ chains running along limitb→ , in which one MoO 6 octahedron alternates with one PO 4 tetrahedron. In fact, four [MoPO 8] ∞ chains share the corners of their polyhedra and the edges of their octahedra, forming [Mo 4P 4O 24] ∞ columns which are linked through MoO 5 bipyramids along limitc→. The K + ions interleaved between these layers are surrounded by eight oxygens, forming bicapped trigonal prisms KO 8. Besides the unusual trigonal bipyramids MoO 5, this structure is also characterized by a tendency to the localization of the electrons, since one octahedral site is occupied by Mo(V), whereas the other octahedral site and the trigonal bipyramid are occupied by Mo(VI). The similarity of this structure with pure octahedral layer structures suggests the possibility of generating various derivatives, and of ion exchange properties.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Scalable computing for evolutionary genomics.

PubMed

Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

2012-01-01

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

The Neuronal Organization of a Unique Cerebellar Specialization: The Valvula Cerebelli of a Mormyrid Fish

PubMed Central

Shi, Zhigang; Zhang, Yueping; Meek, Johannes; Qiao, Jiantian; Han, Victor Z.

2018-01-01

The distal valvula cerebelli is the most prominent part of the mormyrid cerebellum. It is organized in ridges of ganglionic and molecular layers, oriented perpendicular to the granular layer. We have combined intracellular recording and labelling techniques to reveal the cellular morphology of the valvula ridges in slice preparations. We have also locally ejected tracer in slices and in intact animals to examine its input fibers. The palisade dendrites and fine axon arbors of Purkinje cells are oriented in the horizontal plane of the ridge. The dendrites of basal efferent cells and large central cells are confined to the molecular layer, but are not planer. Basal efferent cell axons are thick, and join the basal bundle leaving the cerebellum. Large central cell axons are also thick, and traverse long distances in the transverse plane, with local collaterals in the ganglionic layer. Vertical cells and small central cells also have thick axons with local collaterals. The dendrites of Golgi cells are confined to the molecular layer, but their axon arbors are either confined to the granular layer or proliferate in both the granular and ganglionic layers. Dendrites of deep stellate cells are distributed in the molecular layer, with fine axon arbors in the ganglionic layer. Granule cell axons enter the molecular layer as parallel fibers without bifurcating. Climbing fibers run in the horizontal plane and terminate exclusively in the ganglionic layer. Our results confirm and extend previous studies and suggest a new concept of the circuitry of the mormyrid valvula cerebelli. PMID:18537139
Hierarchical Parallelization of Gene Differential Association Analysis

PubMed Central

2011-01-01

Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels. PMID:21936916
Hierarchical parallelization of gene differential association analysis.

PubMed

Needham, Mark; Hu, Rui; Dwarkadas, Sandhya; Qiu, Xing

2011-09-21

Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.
Fatigue-induced changes in decline running.

PubMed

Mizrahi, J; Verbitsky, O; Isakov, E

2001-03-01

Study the relation between muscle fatigue during eccentric muscle contractions and kinematics of the legs in downhill running. Decline running on a treadmill was used to acquire data on shock accelerations, muscle activity and kinematics, for comparison with level running. In downhill running, local muscle fatigue is the cause of morphological muscle damage which leads to reduced attenuation of shock accelerations. Fourteen subjects ran on a treadmill above level-running anaerobic threshold speed for 30 min, in level and -4 degrees decline running. The following were monitored: metabolic fatigue by means of respiratory parameters; muscle fatigue of the quadriceps by means of elevation in myoelectric activity; and kinematic parameters including knee and ankle angles and hip vertical excursion by means of computerized videography. Data on shock transmission reported in previous studies were also used. Quadriceps fatigue develops in parallel to an increasing vertical excursion of the hip in the stance phase of running, enabled by larger dorsi flexion of the ankle rather than by increased flexion of the knee. The decrease in shock attenuation can be attributed to quadriceps muscle fatigue in parallel to increased vertical excursion of the hips.
A big data geospatial analytics platform - Physical Analytics Integrated Repository and Services (PAIRS)

NASA Astrophysics Data System (ADS)

Hamann, H.; Jimenez Marianno, F.; Klein, L.; Albrecht, C.; Freitag, M.; Hinds, N.; Lu, S.

2015-12-01

A big data geospatial analytics platform:Physical Analytics Information Repository and Services (PAIRS)Fernando Marianno, Levente Klein, Siyuan Lu, Conrad Albrecht, Marcus Freitag, Nigel Hinds, Hendrik HamannIBM TJ Watson Research Center, Yorktown Heights, NY 10598A major challenge in leveraging big geospatial data sets is the ability to quickly integrate multiple data sources into physical and statistical models and be run these models in real time. A geospatial data platform called Physical Analytics Information and Services (PAIRS) is developed on top of open source hardware and software stack to manage Terabyte of data. A new data interpolation and re gridding is implemented where any geospatial data layers can be associated with a set of global grid where the grid resolutions is doubling for consecutive layers. Each pixel on the PAIRS grid have an index that is a combination of locations and time stamp. The indexing allow quick access to data sets that are part of a global data layers and allowing to retrieve only the data of interest. PAIRS takes advantages of parallel processing framework (Hadoop) in a cloud environment to digest, curate, and analyze the data sets while being very robust and stable. The data is stored on a distributed no-SQL database (Hbase) across multiple server, data upload and retrieval is parallelized where the original analytics task is broken up is smaller areas/volume, analyzed independently, and then reassembled for the original geographical area. The differentiating aspect of PAIRS is the ability to accelerate model development across large geographical regions and spatial resolution ranging from 0.1 m up to hundreds of kilometer. System performance is benchmarked on real time automated data ingestion and retrieval of Modis and Landsat data layers. The data layers are curated for sensor error, verified for correctness, and analyzed statistically to detect local anomalies. Multi-layer query enable PAIRS to filter different data layers based on specific conditions (e.g analyze flooding risk of a property based on topography, soil ability to hold water, and forecasted precipitation) or retrieve information about locations that share similar weather and vegetation patterns during extreme weather events like heat wave.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
LFRic: Building a new Unified Model

NASA Astrophysics Data System (ADS)

Melvin, Thomas; Mullerworth, Steve; Ford, Rupert; Maynard, Chris; Hobson, Mike

2017-04-01

The LFRic project, named for Lewis Fry Richardson, aims to develop a replacement for the Met Office Unified Model in order to meet the challenges which will be presented by the next generation of exascale supercomputers. This project, a collaboration between the Met Office, STFC Daresbury and the University of Manchester, builds on the earlier GungHo project to redesign the dynamical core, in partnership with NERC. The new atmospheric model aims to retain the performance of the current ENDGame dynamical core and associated subgrid physics, while also enabling a far greater scalability and flexibility to accommodate future supercomputer architectures. Design of the model revolves around a principle of a 'separation of concerns', whereby the natural science aspects of the code can be developed without worrying about the underlying architecture, while machine dependent optimisations can be carried out at a high level. These principles are put into practice through the development of an autogenerated Parallel Systems software layer (known as the PSy layer) using a domain-specific compiler called PSyclone. The prototype model includes a re-write of the dynamical core using a mixed finite element method, in which different function spaces are used to represent the various fields. It is able to run in parallel with MPI and OpenMP and has been tested on over 200,000 cores. In this talk an overview of the both the natural science and computational science implementations of the model will be presented.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

PubMed

Hoffmann, Thomas J

2011-03-01

It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
Comparison and Analysis of Membrane Fouling between Flocculent Sludge Membrane Bioreactor and Granular Sludge Membrane Bioreactor

PubMed Central

Zhi-Qiang, Chen; Jun-Wen, Li; Yi-Hong, Zhang; Xuan, Wang; Bin, Zhang

2012-01-01

The goal of this study is to investigate the effect of inoculating granules on reducing membrane fouling. In order to evaluate the differences in performance between flocculent sludge and aerobic granular sludge in membrane reactors (MBRs), two reactors were run in parallel and various parameters related to membrane fouling were measured. The results indicated that specific resistance to the fouling layer was five times greater than that of mixed liquor sludge in the granular MBR. The floc sludge more easily formed a compact layer on the membrane surface, and increased membrane resistance. Specifically, the floc sludge had a higher moisture content, extracellular polymeric substances concentration, and negative surface charge. In contrast, aerobic granules could improve structural integrity and strength, which contributed to the preferable permeate performance. Therefore, inoculating aerobic granules in a MBR presents an effective method of reducing the membrane fouling associated with floc sludge the perspective of from the morphological characteristics of microbial aggregates. PMID:22859954
Crashworthiness simulations with DYNA3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schauer, D.A.; Hoover, C.G.; Kay, G.J.

1996-04-01

Current progress in parallel algorithm research and applications in vehicle crash simulation is described for the explicit, finite element algorithms in DYNA3D. Problem partitioning methods and parallel algorithms for contact at material interfaces are the two challenging algorithm research problems that are addressed. Two prototype parallel contact algorithms have been developed for treating the cases of local and arbitrary contact. Demonstration problems for local contact are crashworthiness simulations with 222 locally defined contact surfaces and a vehicle/barrier collision modeled with arbitrary contact. A simulation of crash tests conducted for a vehicle impacting a U-channel small sign post embedded in soilmore » has been run on both the serial and parallel versions of DYNA3D. A significant reduction in computational time has been observed when running these problems on the parallel version. However, to achieve maximum efficiency, complex problems must be appropriately partitioned, especially when contact dominates the computation.« less
On the generation of umbral flashes and running penumbral waves.

NASA Technical Reports Server (NTRS)

Moore, R. L.

1973-01-01

From a review of the observed properties of umbral flashes and running penumbral waves it is proposed that the source of these periodic phenomena is the oscillatory convection which Danielson and Savage (1968) and Savage (1969) have shown is likely to occur in the superadiabatic subphotospheric layers of sunspot umbras. Periods and growth rates are computed for oscillatory modes arising in a simple two-layer model umbra. The results suggest that umbral flashes result from disturbances produced by oscillatory convection occurring in the upper subphotospheric layer of the umbra, where the superadiabatic temperature gradient is much enhanced over that in lower layers, while running penumbral waves are due to oscillations in a layer just below this upper layer.
Simulation of LHC events on a millions threads

NASA Astrophysics Data System (ADS)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

2015-12-01

Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest supercomputer in the world, can run roughly five times the number of parallel processes that the ATLAS experiment typically uses on the Grid. We ported Alpgen, a serial x86 code, to run as a parallel application under MPI on the Blue Gene/Q architecture. By analysis of the Alpgen code, we reduced the memory footprint to allow running 64 threads per node, utilizing the four hardware threads available per core on the PowerPC A2 processor. Event generation and unweighting, typically run as independent serial phases, are coupled together in a single job in this scenario, reducing intermediate writes to the filesystem. By these optimizations, we have successfully run LHC proton-proton physics event generation at the scale of a million threads, filling two-thirds of Mira.
TOMO3D: 3-D joint refraction and reflection traveltime tomography parallel code for active-source seismic data—synthetic test

NASA Astrophysics Data System (ADS)

Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.

2015-10-01

We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.
Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

2000-03-01

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.; Kokelaar, Pieter R.

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 Mflops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Running accuracy analysis of a 3-RRR parallel kinematic machine considering the deformations of the links

NASA Astrophysics Data System (ADS)

Wang, Liping; Jiang, Yao; Li, Tiemin

2014-09-01

Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
Using Parallel Processing for Problem Solving.

DTIC Science & Technology

1979-12-01

are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language
Spatial distribution of the human enamel fracture toughness with aging.

PubMed

Zheng, Qinghua; Xu, Haiping; Song, Fan; Zhang, Lan; Zhou, Xuedong; Shao, Yingfeng; Huang, Dingming

2013-10-01

A better understanding of the fracture toughness (KIC) of human enamel and the changes induced by aging is important for the clinical treatment of teeth cracks and fractures. We conducted microindentation tests and chemical content measurements on molar teeth from "young" (18 ≤ age ≤ 25) and "old" (55 ≤ age) patients. The KIC and the mineral contents (calcium and phosphorus) in the outer, the middle, and the inner enamel layers within the cuspal and the intercuspal regions of the crown were measured through the Vickers toughness test and Energy Dispersive X-Ray Spectroscopy (EDS), respectively. The elastic modulus used for the KIC calculation was measured through atomic force microscope (AFM)-based nanoindentation tests. In the outer enamel layer, two direction-specific values of the KIC were calculated separately (direction I, crack running parallel to the occlusal surface; direction II, perpendicular to direction I). The mean KIC of the outer enamel layer was lower than that of the internal layers (p<0.05). No other region-related differences in the mechanical properties were found in both groups. In the outer enamel layer, old enamel has a lower KIC, II and higher mineral contents than young enamel (p<0.05). The enamel surface becomes more prone to cracks with aging partly due to the reduction in the interprismatic organic matrix observed with the maturation of enamel. Copyright © 2013 Elsevier Ltd. All rights reserved.

Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
HPC in Basin Modeling: Simulating Mechanical Compaction through Vertical Effective Stress using Level Sets

NASA Astrophysics Data System (ADS)

McGovern, S.; Kollet, S. J.; Buerger, C. M.; Schwede, R. L.; Podlaha, O. G.

2017-12-01

In the context of sedimentary basins, we present a model for the simulation of the movement of ageological formation (layers) during the evolution of the basin through sedimentation and compactionprocesses. Assuming a single phase saturated porous medium for the sedimentary layers, the modelfocuses on the tracking of the layer interfaces, through the use of the level set method, as sedimentationdrives fluid-flow and reduction of pore space by compaction. On the assumption of Terzaghi's effectivestress concept, the coupling of the pore fluid pressure to the motion of interfaces in 1-D is presented inMcGovern, et.al (2017) [1] .The current work extends the spatial domain to 3-D, though we maintain the assumption ofvertical effective stress to drive the compaction. The idealized geological evolution is conceptualized asthe motion of interfaces between rock layers, whose paths are determined by the magnitude of a speedfunction in the direction normal to the evolving layer interface. The speeds normal to the interface aredependent on the change in porosity, determined through an effective stress-based compaction law,such as the exponential Athy's law. Provided with the speeds normal to the interface, the level setmethod uses an advection equation to evolve a potential function, whose zero level set defines theinterface. Thus, the moving layer geometry influences the pore pressure distribution which couplesback to the interface speeds. The flexible construction of the speed function allows extension, in thefuture, to other terms to represent different physical processes, analogous to how the compaction rulerepresents material deformation.The 3-D model is implemented using the generic finite element method framework Deal II,which provides tools, building on p4est and interfacing to PETSc, for the massively parallel distributedsolution to the model equations [2]. Experiments are being run on the Juelich Supercomputing Center'sJureca cluster. [1] McGovern, et.al. (2017). Novel basin modelling concept for simulating deformation from mechanical compaction using level sets. Computational Geosciences, SI:ECMOR XV, 1-14.[2] Bangerth, et. al. (2011). Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Transactions on Mathematical Software (TOMS), 38(2):14.
Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme

NASA Astrophysics Data System (ADS)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2015-10-01

The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land's state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.
Full-f version of GENE for turbulence in open-field-line systems

NASA Astrophysics Data System (ADS)

Pan, Q.; Told, D.; Shi, E. L.; Hammett, G. W.; Jenko, F.

2018-06-01

Unique properties of plasmas in the tokamak edge, such as large amplitude fluctuations and plasma-wall interactions in the open-field-line regions, require major modifications of existing gyrokinetic codes originally designed for simulating core turbulence. To this end, the global version of the 3D2V gyrokinetic code GENE, so far employing a δf-splitting technique, is extended to simulate electrostatic turbulence in straight open-field-line systems. The major extensions are the inclusion of the velocity-space nonlinearity, the development of a conducting-sheath boundary, and the implementation of the Lenard-Bernstein collision operator. With these developments, the code can be run as a full-f code and can handle particle loss to and reflection from the wall. The extended code is applied to modeling turbulence in the Large Plasma Device (LAPD), with a reduced mass ratio and a much lower collisionality. Similar to turbulence in a tokamak scrape-off layer, LAPD turbulence involves collisions, parallel streaming, cross-field turbulent transport with steep profiles, and particle loss at the parallel boundary.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

2013-07-01

In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Gryphon, Coranth D.; Miller, Mark D.

1991-01-01

PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Network support for system initiated checkpoints

DOEpatents

Chen, Dong; Heidelberger, Philip

2013-01-29

A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
Three-dimensional midwater camouflage from a novel two-component photonic structure in hatchetfish skin.

PubMed

Rosenthal, Eric I; Holt, Amanda L; Sweeney, Alison M

2017-05-01

The largest habitat by volume on Earth is the oceanic midwater, which is also one of the least understood in terms of animal ecology. The organisms here exhibit a spectacular array of optical adaptations for living in a visual void that have only barely begun to be described. We describe a complex pattern of broadband scattering from the skin of Argyropelecus sp., a hatchetfish found in the mesopelagic zone of the world's oceans. Hatchetfish skin superficially resembles the unpolished side of aluminium foil, but on closer inspection contains a complex composite array of subwavelength-scale dielectric structures. The superficial layer of this array contains dielectric stacks that are rectangular in cross-section, while the deeper layer contains dielectric bundles that are elliptical in cross-section; the cells in both layers have their longest dimension running parallel to the dorsal-ventral axis of the fish. Using the finite-difference time-domain approach and photographic radiometry, we explored the structural origins of this scattering behaviour and its environmental consequences. When the fish's flank is illuminated from an arbitrary incident angle, a portion of the scattered light exits in an arc parallel to the fish's anterior-posterior axis. Simultaneously, some incident light is also scattered downwards through the complex birefringent skin structure and exits from the ventral photophores. We show that this complex scattering pattern will provide camouflage simultaneously against the horizontal radially symmetric solar radiance in this habitat, and the predatory bioluminescent searchlights that are common here. The structure also directs light incident on the flank of the fish into the downwelling, silhouette-hiding counter-illumination of the ventral photophores. © 2017 The Authors.
Three-dimensional midwater camouflage from a novel two-component photonic structure in hatchetfish skin

PubMed Central

Rosenthal, Eric I.; Holt, Amanda L.

2017-01-01

The largest habitat by volume on Earth is the oceanic midwater, which is also one of the least understood in terms of animal ecology. The organisms here exhibit a spectacular array of optical adaptations for living in a visual void that have only barely begun to be described. We describe a complex pattern of broadband scattering from the skin of Argyropelecus sp., a hatchetfish found in the mesopelagic zone of the world's oceans. Hatchetfish skin superficially resembles the unpolished side of aluminium foil, but on closer inspection contains a complex composite array of subwavelength-scale dielectric structures. The superficial layer of this array contains dielectric stacks that are rectangular in cross-section, while the deeper layer contains dielectric bundles that are elliptical in cross-section; the cells in both layers have their longest dimension running parallel to the dorsal–ventral axis of the fish. Using the finite-difference time-domain approach and photographic radiometry, we explored the structural origins of this scattering behaviour and its environmental consequences. When the fish's flank is illuminated from an arbitrary incident angle, a portion of the scattered light exits in an arc parallel to the fish's anterior–posterior axis. Simultaneously, some incident light is also scattered downwards through the complex birefringent skin structure and exits from the ventral photophores. We show that this complex scattering pattern will provide camouflage simultaneously against the horizontal radially symmetric solar radiance in this habitat, and the predatory bioluminescent searchlights that are common here. The structure also directs light incident on the flank of the fish into the downwelling, silhouette-hiding counter-illumination of the ventral photophores. PMID:28468923
Jet formation and equatorial superrotation in Jupiter's atmosphere: Numerical modelling using a new efficient parallel code

NASA Astrophysics Data System (ADS)

Rivier, Leonard Gilles

Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We show that such heat forcing is able to produce strong equatorial superrotating winds, one of the most striking feature of the Jovian circulation.
Students' Adoption of Course-Specific Approaches to Learning in Two Parallel Courses

ERIC Educational Resources Information Center

Öhrstedt, Maria; Lindfors, Petra

2016-01-01

Research on students' adoption of course-specific approaches to learning in parallel courses is limited and inconsistent. This study investigated second-semester psychology students' levels of deep, surface and strategic approaches in two courses running in parallel within a real-life university setting. The results showed significant differences…
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
Parallel computing in genomic research: advances and applications

PubMed Central

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801
Parallel computing in genomic research: advances and applications.

PubMed

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Pattern

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gong, Zhenhuan; Boyuka, David; Zou, X

Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less

Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design

NASA Astrophysics Data System (ADS)

Debes, Eric; Kaine, Greg

2002-11-01

In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.
Scalable descriptive and correlative statistics with Titan.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, David C.; Pebay, Philippe Pierre

This report summarizes the existing statistical engines in VTK/Titan and presents the parallel versions thereof which have already been implemented. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
4-Nitro-aniline-picric acid (2/1).

PubMed

Li, Yan-Jun

2009-09-30

In the title adduct, C(6)H(3)N(3)O(7)·0.5C(6)H(6)N(2)O(2), the complete 4-nitro-aniline mol-ecule is generated by a crystallographic twofold axis with two C atoms and two N atoms lying on the axis. The mol-ecular components are linked into two dimensional corrugated layers running parallel to the (001) plane by a combination of inter-molecular N-H⋯O and C-H⋯O hydrogen bonds. The phenolic oxygen and two sets of nitro oxygen atoms in the picric acid were found to be disordered with occupancies of 0.81 (2):0.19 (2) and 0.55 (3):0.45 (3) and 0.77 (4):0.23 (4), respectively.
ProperCAD: A portable object-oriented parallel environment for VLSI CAD

NASA Technical Reports Server (NTRS)

Ramkumar, Balkrishna; Banerjee, Prithviraj

1993-01-01

Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.
Development for SSV on a parallel processing system (PARAGON)

NASA Astrophysics Data System (ADS)

Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

1995-12-01

A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

NASA Technical Reports Server (NTRS)

Long, Lyle N.; Brentner, Kenneth S.

2000-01-01

This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

NASA Astrophysics Data System (ADS)

Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

2016-10-01

A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions

NASA Astrophysics Data System (ADS)

Buddala, Santhoshi Snigdha

Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.
Scalable Domain Decomposed Monte Carlo Particle Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Brien, Matthew Joseph

2013-12-05

In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Static analysis of the hull plate using the finite element method

NASA Astrophysics Data System (ADS)

Ion, A.

2015-11-01

This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
A comparison of five benchmarks

NASA Technical Reports Server (NTRS)

Huss, Janice E.; Pennline, James A.

1987-01-01

Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Scalable Domain Decomposed Monte Carlo Particle Transport

NASA Astrophysics Data System (ADS)

O'Brien, Matthew Joseph

In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

NASA Astrophysics Data System (ADS)

Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

2015-12-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Morphology of the core fibrous layer of the cetacean tail fluke.

PubMed

Gough, William T; Fish, Frank E; Wainwright, Dylan K; Bart-Smith, Hilary

2018-06-01

The cetacean tail fluke blades are not supported by any vertebral elements. Instead, the majority of the blades are composed of a densely packed collagenous fiber matrix known as the core layer. Fluke blades from six species of odontocete cetaceans were examined to compare the morphology and orientation of fibers at different locations along the spanwise and chordwise fluke blade axes. The general fiber morphology was consistent with a three-dimensional structure comprised of two-dimensional sheets of fibers aligned tightly in a laminated configuration along the spanwise axis. The laminated configuration of the fluke blades helps to maintain spanwise rigidity while allowing partial flexibility during swimming. When viewing the chordwise sectional face at the leading edge and mid-chord regions, fibers displayed a crossing pattern. This configuration relates to bending and structural support of the fluke blade. The trailing edge core was found to have parallel fibers arranged more dorso-ventrally. The fiber morphology of the fluke blades was dorso-ventrally symmetrical and similar in all species except the pygmy sperm whale (Kogia breviceps), which was found to have additional core layer fiber bundles running along the span of the fluke blade. These additional fibers may increase stiffness of the structure by resisting tension along their long spanwise axis. © 2018 Wiley Periodicals, Inc.
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.

PubMed

Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J

2013-12-06

Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.

FOLDER: A numerical tool to simulate the development of structures in layered media

NASA Astrophysics Data System (ADS)

Adamuszek, Marta; Dabrowski, Marcin; Schmid, Daniel W.

2015-04-01

FOLDER is a numerical toolbox for modelling deformation in layered media during layer parallel shortening or extension in two dimensions. FOLDER builds on MILAMIN [1], a finite element method based mechanical solver, with a range of utilities included from the MUTILS package [2]. Numerical mesh is generated using the Triangle software [3]. The toolbox includes features that allow for: 1) designing complex structures such as multi-layer stacks, 2) accurately simulating large-strain deformation of linear and non-linear viscous materials, 3) post-processing of various physical fields such as velocity (total and perturbing), rate of deformation, finite strain, stress, deviatoric stress, pressure, apparent viscosity. FOLDER is designed to ensure maximum flexibility to configure model geometry, define material parameters, specify range of numerical parameters in simulations and choose the plotting options. FOLDER is an open source MATLAB application and comes with a user friendly graphical interface. The toolbox additionally comprises an educational application that illustrates various analytical solutions of growth rates calculated for the cases of folding and necking of a single layer with interfaces perturbed with a single sinusoidal waveform. We further derive two novel analytical expressions for the growth rate in the cases of folding and necking of a linear viscous layer embedded in a linear viscous medium of a finite thickness. We use FOLDER to test the accuracy of single-layer folding simulations using various 1) spatial and temporal resolutions, 2) time integration schemes, and 3) iterative algorithms for non-linear materials. The accuracy of the numerical results is quantified by: 1) comparing them to analytical solution, if available, or 2) running convergence tests. As a result, we provide a map of the most optimal choice of grid size, time step, and number of iterations to keep the results of the numerical simulations below a given error for a given time integration scheme. We also demonstrate that Euler and Leapfrog time integration schemes are not recommended for any practical use. Finally, the capabilities of the toolbox are illustrated based on two examples: 1) shortening of a synthetic multi-layer sequence and 2) extension of a folded quartz vein embedded in phyllite from Sprague Upper Reservoir (example discussed by Sherwin and Chapple [4]). The latter example demonstrates that FOLDER can be successfully used for reverse modelling and mechanical restoration. [1] Dabrowski, M., Krotkiewski, M., and Schmid, D. W., 2008, MILAMIN: MATLAB-based finite element method solver for large problems. Geochemistry Geophysics Geosystems, vol. 9. [2] Krotkiewski, M. and Dabrowski M., 2010 Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parallel Computing, 36(4):181-198 [3] Shewchuk, J. R., 1996, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, In: Applied Computational Geometry: Towards Geometric Engineering'' (Ming C. Lin and Dinesh Manocha, editors), Vol. 1148 of Lecture Notes in Computer Science, pp. 203-222, Springer-Verlag, Berlin [4] Sherwin, J.A., Chapple, W.M., 1968. Wavelengths of single layer folds - a Comparison between theory and Observation. American Journal of Science 266 (3), p. 167-179
Parallel computing for automated model calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

2002-07-29

Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less
Parallelization of the Flow Field Dependent Variation Scheme for Solving the Triple Shock/Boundary Layer Interaction Problem

NASA Technical Reports Server (NTRS)

Schunk, Richard Gregory; Chung, T. J.

2001-01-01

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Adaptive Grid Refinement for Atmospheric Boundary Layer Simulations

NASA Astrophysics Data System (ADS)

van Hooft, Antoon; van Heerwaarden, Chiel; Popinet, Stephane; van der linden, Steven; de Roode, Stephan; van de Wiel, Bas

2017-04-01

We validate and benchmark an adaptive mesh refinement (AMR) algorithm for numerical simulations of the atmospheric boundary layer (ABL). The AMR technique aims to distribute the computational resources efficiently over a domain by refining and coarsening the numerical grid locally and in time. This can be beneficial for studying cases in which length scales vary significantly in time and space. We present the results for a case describing the growth and decay of a convective boundary layer. The AMR results are benchmarked against two runs using a fixed, fine meshed grid. First, with the same numerical formulation as the AMR-code and second, with a code dedicated to ABL studies. Compared to the fixed and isotropic grid runs, the AMR algorithm can coarsen and refine the grid such that accurate results are obtained whilst using only a fraction of the grid cells. Performance wise, the AMR run was cheaper than the fixed and isotropic grid run with similar numerical formulations. However, for this specific case, the dedicated code outperformed both aforementioned runs.
Increasing airport capacity with modified IFR approach procedures for close-spaced parallel runways

DOT National Transportation Integrated Search

2001-01-01

Because of wake turbulence considerations, current instrument approach : procedures treat close-spaced (i.e., less than 2,500 feet apart) parallel run : ways as a single runway. This restriction is designed to assure safety for all : aircraft types u...
Parallel Computation of the Regional Ocean Modeling System (ROMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, P; Song, Y T; Chao, Y

2005-04-05

The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers

PubMed Central

Besnier, Francois; Glover, Kevin A.

2013-01-01

This software package provides an R-based framework to make use of multi-core computers when running analyses in the population genetics program STRUCTURE. It is especially addressed to those users of STRUCTURE dealing with numerous and repeated data analyses, and who could take advantage of an efficient script to automatically distribute STRUCTURE jobs among multiple processors. It also consists of additional functions to divide analyses among combinations of populations within a single data set without the need to manually produce multiple projects, as it is currently the case in STRUCTURE. The package consists of two main functions: MPI_structure() and parallel_structure() as well as an example data file. We compared the performance in computing time for this example data on two computer architectures and showed that the use of the present functions can result in several-fold improvements in terms of computation time. ParallelStructure is freely available at https://r-forge.r-project.org/projects/parallstructure/. PMID:23923012
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
4-Nitroaniline–picric acid (2/1)

PubMed Central

Li, Yan-jun

2009-01-01

In the title adduct, C6H3N3O7·0.5C6H6N2O2, the complete 4-nitroaniline molecule is generated by a crystallographic twofold axis with two C atoms and two N atoms lying on the axis. The molecular components are linked into two dimensional corrugated layers running parallel to the (001) plane by a combination of intermolecular N—H⋯O and C—H⋯O hydrogen bonds. The phenolic oxygen and two sets of nitro oxygen atoms in the picric acid were found to be disordered with occupancies of 0.81 (2):0.19 (2) and 0.55 (3):0.45 (3) and 0.77 (4):0.23 (4), respectively. PMID:21578004
Fast parallel algorithm for slicing STL based on pipeline

NASA Astrophysics Data System (ADS)

Ma, Xulong; Lin, Feng; Yao, Bo

2016-05-01

In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Magnetic behaviour of multisegmented FeCoCu/Cu electrodeposited nanowires

NASA Astrophysics Data System (ADS)

Núñez, A.; Pérez, L.; Abuín, M.; Araujo, J. P.; Proenca, M. P.

2017-04-01

Understanding the magnetic behaviour of multisegmented nanowires (NWs) is a major key for the application of such structures in future devices. In this work, magnetic/non-magnetic arrays of FeCoCu/Cu multilayered NWs electrodeposited in nanoporous alumina templates are studied. Contrarily to most reports on multilayered NWs, the magnetic layer thickness was kept constant (30 nm) and only the non-magnetic layer thickness was changed (0 to 80 nm). This allowed us to tune the interwire and intrawire interactions between the magnetic layers in the NW array creating a three-dimensional (3D) magnetic system without the need to change the template characteristics. Magnetic hysteresis loops, measured with the applied field parallel and perpendicular to the NWs’ long axis, showed the effect of the non-magnetic Cu layer on the overall magnetic properties of the NW arrays. In particular, introducing Cu layers along the magnetic NW axis creates domain wall nucleation sites that facilitate the magnetization reversal of the wires, as seen by the decrease in the parallel coercivity and the reduction of the perpendicular saturation field. By further increasing the Cu layer thickness, the interactions between the magnetic segments, both along the NW axis and of neighbouring NWs, decrease, thus rising again the parallel coercivity and the perpendicular saturation field. This work shows how one can easily tune the parallel and perpendicular magnetic properties of a 3D magnetic layer system by adjusting the non-magnetic layer thickness.
A Theoretical Study of Cold Air Damming.

NASA Astrophysics Data System (ADS)

Xu, Qin

1990-12-01

The dynamics of cold air damming are examined analytically with a two-layer steady state model. The upper layer is a warm and saturated cross-mountain (easterly or southeasterly onshore) flow. The lower layer is a cold mountain-parallel (northerly) jet trapped on the windward (eastern) side of the mountain. The interface between the two layers represents a coastal front-a sloping inversion layer coupling the trapped cold dome with the warm onshore flow above through pressure continuity.An analytical expression is obtained for the inviscid upper-layer flow with hydrostatic and moist adiabatic approximations. Blackadar's PBL parameterization of eddy viscosity is used in the lower-layer equations. Solutions for the mountain-parallel jet and its associated secondary transverse circulation are obtained by expanding asymptotically upon a small parameter proportional to the square root of the inertial aspect ratio-the ratio between the mountain height and the radius of inertial oscillation. The geometric shape of the sloping interface is solved numerically from a differential-integral equation derived from the pressure continuity condition imposed at the interface.The observed flow structures and force balances of cold air damming events are produced qualitatively by the model. In the cold dome the mountain-parallel jet is controlled by the competition between the mountain-parallel pressure gradient and friction: the jet is stronger with smoother surfaces, higher mountains, and faster mountain-normal geostrophic winds. In the mountain-normal direction the vertically averaged force balance in the cold dome is nearly geostrophic and controls the geometric shape of the cold dome. The basic mountain-normal pressure gradient generated in the cold dome by the negative buoyancy distribution tends to flatten the sloping interface and expand the cold dome upstream against the mountain-normal pressure gradient (produced by the upper-layer onshore wind) and Coriolis force (induced by the lower-layer mountain-parallel jet). It is found that the interface slope increases and the cold dome shrinks as the Froude number and/or upstream mountain-parallel geostrophic wind increase, or as the Rossby number, upper-layer depth, and/or surface roughness length decrease, and vice versa. The cold dome will either vanish or not be in a steady state if the Froude number is large enough or the roughness length gets too small. The theoretical findings are explained physically based on detailed analyses of the force balance along the inversion interface.
Drug innovation, price controls, and parallel trade.

PubMed

Matteucci, Giorgio; Reverberi, Pierfrancesco

2016-12-21

We study the long-run welfare effects of parallel trade (PT) in pharmaceuticals. We develop a two-country model of PT with endogenous quality, where the pharmaceutical firm negotiates the price of the drug with the government in the foreign country. We show that, even though the foreign government does not consider global R&D costs, (the threat of) PT improves the quality of the drug as long as the foreign consumers' valuation of quality is high enough. We find that the firm's short-run profit may be higher when PT is allowed. Nonetheless, this is neither necessary nor sufficient for improving drug quality in the long run. We also show that improving drug quality is a sufficient condition for PT to increase global welfare. Finally, we show that, when PT is allowed, drug quality may be higher with than without price controls.
Learning and Parallelization Boost Constraint Search

ERIC Educational Resources Information Center

Yun, Xi

2013-01-01

Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…
The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dritz, K.W.; Boyle, J.M.

This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Can parallel use of different running shoes decrease running-related injury risk?

PubMed

Malisoux, L; Ramesh, J; Mann, R; Seil, R; Urhausen, A; Theisen, D

2015-02-01

The aim of this study was to determine if runners who use concomitantly different pairs of running shoes are at a lower risk of running-related injury (RRI). Recreational runners (n = 264) participated in this 22-week prospective follow-up and reported all information about their running session characteristics, other sport participation and injuries on a dedicated Internet platform. A RRI was defined as a physical pain or complaint located at the lower limbs or lower back region, sustained during or as a result of running practice and impeding planned running activity for at least 1 day. One-third of the participants (n = 87) experienced at least one RRI during the observation period. The adjusted Cox regression analysis revealed that the parallel use of more than one pair of running shoes was a protective factor [hazard ratio (HR) = 0.614; 95% confidence interval (CI) = 0.389-0.969], while previous injury was a risk factor (HR = 1.722; 95%CI = 1.114-2.661). Additionally, increased mean session distance (km; HR = 0.795; 95%CI = 0.725-0.872) and increased weekly volume of other sports (h/week; HR = 0.848; 95%CI = 0.732-0.982) were associated with lower RRI risk. Multiple shoe use and participation in other sports are strategies potentially leading to a variation of the load applied to the musculoskeletal system. They could be advised to recreational runners to prevent RRI. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.

Architectures for reasoning in parallel

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.

1989-01-01

The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
The Tera Multithreaded Architecture and Unstructured Meshes

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.; Mavriplis, Dimitri J.

1998-01-01

The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Parallel Ray Tracing Using the Message Passing Interface

DTIC Science & Technology

2007-09-01

software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very...Cameron, Senior Member, IEEE Abstract—Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to...National Aeronautics and Space Administration (NASA), optical ray tracing, parallel computing, parallel pro- cessing, prime numbers, ray tracing
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Framework for Parallel Preprocessing of Microarray Data Using Hadoop

PubMed Central

2018-01-01

Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
The fracture properties and mechanical design of human fingernails.

PubMed

Farren, L; Shayler, S; Ennos, A R

2004-02-01

Fingernails are a characteristic feature of primates, and are composed of three layers of the fibrous composite keratin. This study examined the structure and fracture properties of human fingernails to determine how they resist bending forces while preventing fractures running longitudinally into the nail bed. Nail clippings were first torn manually to examine the preferred crack direction. Next, scissor cutting tests were carried out to compare the fracture toughness of central and outer areas in both the transverse and longitudinal direction. The fracture toughness of each of the three isolated layers was also measured in this way to determine their relative contributions to the toughness. Finally, the structure was examined by carrying out scanning electron microscopy of free fracture surfaces and polarized light microscopy of nail sections. When nails were torn, cracks were always diverted transversely, parallel to the free edge of the nail. Cutting tests showed that this occurred because the energy to cut nails transversely, at approximately 3 kJ m(-2), was about half that needed (approx. 6 kJ m(-2)) to cut them longitudinally. This anisotropy was imparted by the thick intermediate layer, which comprises long, narrow cells that are oriented transversely; the energy needed to cut this layer transversely was only a quarter of that needed to cut it longitudinally. In contrast the tile-like cells in the thinner dorsal and ventral layers showed isotropic behaviour. They probably act to increase the nail's bending strength, and as they wrap around the edge of the nail, they also help prevent cracks from forming. These results cast light on the mechanical behaviour and care of fingernails.
Ensemble Smoother implemented in parallel for groundwater problems applications

NASA Astrophysics Data System (ADS)

Leyva, E.; Herrera, G. S.; de la Cruz, L. M.

2013-05-01

Data assimilation is a process that links forecasting models and measurements using the benefits from both sources. The Ensemble Kalman Filter (EnKF) is a data-assimilation sequential-method that was designed to address two of the main problems related to the use of the Extended Kalman Filter (EKF) with nonlinear models in large state spaces, i-e the use of a closure problem and massive computational requirements associated with the storage and subsequent integration of the error covariance matrix. The EnKF has gained popularity because of its simple conceptual formulation and relative ease of implementation. It has been used successfully in various applications of meteorology and oceanography and more recently in petroleum engineering and hydrogeology. The Ensemble Smoother (ES) is a method similar to EnKF, it was proposed by Van Leeuwen and Evensen (1996). Herrera (1998) proposed a version of the ES which we call Ensemble Smoother of Herrera (ESH) to distinguish it from the former. It was introduced for space-time optimization of groundwater monitoring networks. In recent years, this method has been used for data assimilation and parameter estimation in groundwater flow and transport models. The ES method uses Monte Carlo simulation, which consists of generating repeated realizations of the random variable considered, using a flow and transport model. However, often a large number of model runs are required for the moments of the variable to converge. Therefore, depending on the complexity of problem a serial computer may require many hours of continuous use to apply the ES. For this reason, it is required to parallelize the process in order to do it in a reasonable time. In this work we present the results of a parallelization strategy to reduce the execution time for doing a high number of realizations. The software GWQMonitor by Herrera (1998), implements all the algorithms required for the ESH in Fortran 90. We develop a script in Python using mpi4py, in order to execute GWQMonitor in parallel, applying the MPI library. Our approach is to calculate the initial inputs for each realization, and run groups of these realizations in separate processors. The only modification to the GWQMonitor was the final calculation of the covariance matrix. This strategy was applied to the study of a simplified aquifer in a rectangular domain of a single layer. We show the speedup and efficiency for different number of processors.
5. Aerial view of turnpike path running through center of ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

5. Aerial view of turnpike path running through center of photograph along row of trees. 1917 realignment visible along left edge of photograph along edge of forest. Modernized alignment resumes at top right of photograph. View looking north. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
Light trapping architecture for photovoltaic and photodector applications

DOEpatents

Forrest, Stephen R.; Lunt, Richard R.; Slootsky, Michael

2016-08-09

There is disclosed photovoltaic device structures which trap admitted light and recycle it through the contained photosensitive materials to maximize photoabsorption. For example, there is disclosed a photosensitive optoelectronic device comprising: a first reflective layer comprising a thermoplastic resin; a second reflective layer substantially parallel to the first reflective layer; a first transparent electrode layer on at least one of the first and second reflective layer; and a photosensitive region adjacent to the first electrode, wherein the first transparent electrode layer is substantially parallel to the first reflective layer and adjacent to the photosensitive region, and wherein the device has an exterior face transverse to the planes of the reflective layers where the exterior face has an aperture for admission of incident radiation to the interior of the device.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

NASA Astrophysics Data System (ADS)

Hunsaker, Isaac L.

Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

NASA Technical Reports Server (NTRS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
Unsteady boundary-layer injection

NASA Technical Reports Server (NTRS)

Telionis, D. P.; Jones, G. S.

1981-01-01

The boundary-layer equations for two-dimensional incompressible flow are integrated numerically for the flow over a flat plate and a Howarth body. Injection is introduced either impulsively or periodically along a narrow strip. Results indicate that injection perpendicular to the wall is transmitted instantly across the boundary layer and has little effect on the velocity profile parallel to the wall. The effect is a little more noticeable for flows with adverse pressure gradients. Injection parallel to the wall results in fuller velocity profiles. Parallel and oscillatory injection appears to influence the mean. The amplitude of oscillation decreases with distance from the injection strip but further downstream it increases again in a manner reminiscent of an unstable process.
Population annealing with weighted averages: A Monte Carlo method for rough free-energy landscapes

NASA Astrophysics Data System (ADS)

Machta, J.

2010-08-01

The population annealing algorithm introduced by Hukushima and Iba is described. Population annealing combines simulated annealing and Boltzmann weighted differential reproduction within a population of replicas to sample equilibrium states. Population annealing gives direct access to the free energy. It is shown that unbiased measurements of observables can be obtained by weighted averages over many runs with weight factors related to the free-energy estimate from the run. Population annealing is well suited to parallelization and may be a useful alternative to parallel tempering for systems with rough free-energy landscapes such as spin glasses. The method is demonstrated for spin glasses.
Local rollback for fault-tolerance in parallel computing systems

DOEpatents

Blumrich, Matthias A [Yorktown Heights, NY; Chen, Dong [Yorktown Heights, NY; Gara, Alan [Yorktown Heights, NY; Giampapa, Mark E [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugavanam, Krishnan [Yorktown Heights, NY

2012-01-24

A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
The seasonal-cycle climate model

NASA Technical Reports Server (NTRS)

Marx, L.; Randall, D. A.

1981-01-01

The seasonal cycle run which will become the control run for the comparison with runs utilizing codes and parameterizations developed by outside investigators is discussed. The climate model currently exists in two parallel versions: one running on the Amdahl and the other running on the CYBER 203. These two versions are as nearly identical as machine capability and the requirement for high speed performance will allow. Developmental changes are made on the Amdahl/CMS version for ease of testing and rapidity of turnaround. The changes are subsequently incorporated into the CYBER 203 version using vectorization techniques where speed improvement can be realized. The 400 day seasonal cycle run serves as a control run for both medium and long range climate forecasts alsensitivity studies.
A three-dimensional spectral algorithm for simulations of transition and turbulence

NASA Technical Reports Server (NTRS)

Zang, T. A.; Hussaini, M. Y.

1985-01-01

A spectral algorithm for simulating three dimensional, incompressible, parallel shear flows is described. It applies to the channel, to the parallel boundary layer, and to other shear flows with one wall bounded and two periodic directions. Representative applications to the channel and to the heated boundary layer are presented.
Parallel Event Analysis Under Unix

NASA Astrophysics Data System (ADS)

Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.
RAMA: A file system for massively parallel computers

NASA Technical Reports Server (NTRS)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

Generating unstructured nuclear reactor core meshes in parallel

DOE PAGES

Jain, Rajeev; Tautges, Timothy J.

2014-10-24

Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Roofline model toolkit: A practical tool for architectural and program analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Parallelization of a hydrological model using the message passing interface

USGS Publications Warehouse

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
A Simulation of the Front End Signal Digitization for the ATLAS Muon Spectrometer thin RPC trigger upgrade project

NASA Astrophysics Data System (ADS)

Meng, Xiangting; Chapman, John; Levin, Daniel; Dai, Tiesheng; Zhu, Junjie; Zhou, Bing; Um Atlas Group Team

2016-03-01

The ATLAS Muon Spectrometer Phase-I (and Phase-II) upgrade includes the BIS78 muon trigger detector project: two sets of eight very thin Resistive Place Chambers (tRPCs) combined with small Monitored Drift Tube (MDT) chambers in the pseudorapidity region 1<| η|<1.3. The tRPCs will be comprised of triplet readout layer in each of the eta and azimuthal phi coordinates, with about 400 readout strips per layer. The anticipated hit rate is 100-200 kHz per strip. Digitization of the strip signals will be done by 32-channel CERN HPTDC chips. The HPTDC is a highly configurable ASIC designed by the CERN Microelectronics group. It can work in both trigger and trigger-less modes, be readout in parallel or serially. For Phase-I operation, a stringent latency requirement of 43 bunch crossings (1075 ns) is imposed. The latency budget for the front end digitization must be kept to a minimal value, ideally less than 350 ns. We conducted detailed HPTDC latency simulations using the Behavioral Verilog code from the CERN group. We will report the results of these simulations run for the anticipated detector operating environment and for various HPTDC configurations.
Modeling and investigation of the channeling phenomenon in downdraft stratified gasifers.

PubMed

Allesina, Giulio; Pedrazzi, Simone; Tartarini, Paolo

2013-10-01

Downdraft stratified gasifiers seem to be the reactors which are most influenced by loading conditions. Moreover, the larger the reactor is, the higher the possibility to stumble across a channeling phenomenon. This high sensitivity is due to the limited thickness and superficial placement of the flaming pyrolysis layer coupled with the necessity to keep all the zones parallel for a correct running of this kind of gasifier. This study was aimed at modeling and investigating the channeling phenomenon generated by loading condition variations on a 250-kWe nominal power gasification power plant. The experimental campaign showed great variations in most of the plant outputs. These phenomena were modeled on two modified mathematical models obtained from literature. The results of the models confirmed the capability of this approach to predict the channeling phenomena and its dependency on the loading method. Copyright © 2013 Elsevier Ltd. All rights reserved.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Turbidity Currents With Equilibrium Basal Driving Layers: A Mechanism for Long Runout

NASA Astrophysics Data System (ADS)

Luchi, R.; Balachandar, S.; Seminara, G.; Parker, G.

2018-02-01

Turbidity currents run out over 100 km in lakes and reservoirs, and over 1,000 km in the ocean. They do so without dissipating themselves via excess entrainment of ambient water. Existing layer-averaged formulations cannot capture this. We use a numerical model to describe the temporal evolution of a turbidity current toward steady state under condition of zero net sediment flux at the bed. The flow self-partitions itself into two layers. The lower "driving layer" approaches an invariant flow thickness, velocity profile, and suspended sediment concentration profile that sequesters nearly all of the suspended sediment. This layer can continue indefinitely at steady state over a constant bed slope. The upper "driven layer" contains a small fraction of the suspended sediment. The devolution of the flow into these two layers likely allows the driving layer to run out long distances.
Digital tomosynthesis mammography using a parallel maximum-likelihood reconstruction method

NASA Astrophysics Data System (ADS)

Wu, Tao; Zhang, Juemin; Moore, Richard; Rafferty, Elizabeth; Kopans, Daniel; Meleis, Waleed; Kaeli, David

2004-05-01

A parallel reconstruction method, based on an iterative maximum likelihood (ML) algorithm, is developed to provide fast reconstruction for digital tomosynthesis mammography. Tomosynthesis mammography acquires 11 low-dose projections of a breast by moving an x-ray tube over a 50° angular range. In parallel reconstruction, each projection is divided into multiple segments along the chest-to-nipple direction. Using the 11 projections, segments located at the same distance from the chest wall are combined to compute a partial reconstruction of the total breast volume. The shape of the partial reconstruction forms a thin slab, angled toward the x-ray source at a projection angle 0°. The reconstruction of the total breast volume is obtained by merging the partial reconstructions. The overlap region between neighboring partial reconstructions and neighboring projection segments is utilized to compensate for the incomplete data at the boundary locations present in the partial reconstructions. A serial execution of the reconstruction is compared to a parallel implementation, using clinical data. The serial code was run on a PC with a single PentiumIV 2.2GHz CPU. The parallel implementation was developed using MPI and run on a 64-node Linux cluster using 800MHz Itanium CPUs. The serial reconstruction for a medium-sized breast (5cm thickness, 11cm chest-to-nipple distance) takes 115 minutes, while a parallel implementation takes only 3.5 minutes. The reconstruction time for a larger breast using a serial implementation takes 187 minutes, while a parallel implementation takes 6.5 minutes. No significant differences were observed between the reconstructions produced by the serial and parallel implementations.
Embedded cluster metal-polymeric micro interface and process for producing the same

DOEpatents

Menezes, Marlon E.; Birnbaum, Howard K.; Robertson, Ian M.

2002-01-29

A micro interface between a polymeric layer and a metal layer includes isolated clusters of metal partially embedded in the polymeric layer. The exposed portion of the clusters is smaller than embedded portions, so that a cross section, taken parallel to the interface, of an exposed portion of an individual cluster is smaller than a cross section, taken parallel to the interface, of an embedded portion of the individual cluster. At least half, but not all of the height of a preferred spherical cluster is embedded. The metal layer is completed by a continuous layer of metal bonded to the exposed portions of the discontinuous clusters. The micro interface is formed by heating a polymeric layer to a temperature, near its glass transition temperature, sufficient to allow penetration of the layer by metal clusters, after isolated clusters have been deposited on the layer at lower temperatures. The layer is recooled after embedding, and a continuous metal layer is deposited upon the polymeric layer to bond with the discontinuous metal clusters.
Implementation of a 3D mixing layer code on parallel computers

NASA Technical Reports Server (NTRS)

Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

1995-01-01

This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
Running increases neurogenesis without retinoic acid receptor activation in the adult mouse dentate gyrus.

PubMed

Aberg, Elin; Perlmann, Thomas; Olson, Lars; Brené, Stefan

2008-01-01

Both vitamin A deficiency and high doses of retinoids can result in learning and memory impairments, depression as well as decreases in cell proliferation, neurogenesis and cell survival. Physical activity enhances hippocampal neurogenesis and can also exert an antidepressant effect. Here we elucidate a putative link between running, retinoid signaling, and neurogenesis in hippocampus. Adult transgenic reporter mice designed to detect ligand-activated retinoic acid receptors (RAR) or retinoid X receptors (RXR) were used to localize the distribution of activated RAR or RXR at the single-cell level in the brain. Two months of voluntary wheel-running induced an increase in hippocampal neurogenesis as indicated by an almost two-fold increase in doublecortin-immunoreactive cells. Running activity was correlated with neurogenesis. Under basal conditions a distinct pattern of RAR-activated cells was detected in the granule cell layer of the dentate gyrus (DG), thalamus, and cerebral cortex layers 3-4 and to a lesser extent in hippocampal pyramidal cell layers CA1-CA3. Running did not change the number of RAR-activated cells in the DG. There was no correlation between running and RAR activation or between RAR activation and neurogenesis in the DG of hippocampus. Only a few scattered activated retinoid X receptors were found in the DG under basal conditions and after wheel-running, but RXR was detected in other areas such as in the hilus region of hippocampus and in layer VI of cortex cerebri. RAR agonists affect mood in humans and reduce neurogenesis, learning and memory in animal models. In our study, long-term running increased neurogenesis but did not alter RAR ligand activation in the DG in individually housed mice. Thus, our data suggest that the effects of exercise on neurogenesis and other plasticity changes in the hippocampal formation are mediated by mechanisms that do not involve retinoid receptor activation. (c) 2008 Wiley-Liss, Inc.
Non-volatile memory for checkpoint storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.

A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less
Multitasking for flows about multiple body configurations using the chimera grid scheme

NASA Technical Reports Server (NTRS)

Dougherty, F. C.; Morgan, R. L.

1987-01-01

The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
Design considerations for parallel graphics libraries

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Density-based parallel skin lesion border detection with webCL

PubMed Central

2015-01-01

Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.

PubMed

Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu

2015-01-01

Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
Dynamical calculations for RHEED intensity oscillations

NASA Astrophysics Data System (ADS)

Daniluk, Andrzej

2005-03-01

A practical computing algorithm working in real time has been developed for calculating the reflection high-energy electron diffraction from the molecular beam epitaxy growing surface. The calculations are based on the use of a dynamical diffraction theory in which the electrons are taken to be diffracted by a potential, which is periodic in the dimension perpendicular to the surface. The results of the calculations are presented in the form of rocking curves to illustrate how the diffracted beam intensities depend on the glancing angle of the incident beam. Program summaryTitle of program: RHEED Catalogue identifier:ADUY Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUY Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Pentium-based PC Operating systems or monitors under which the program has been tested: Windows 9x, XP, NT, Linux Programming language used: Borland C++ Memory required to execute with typical data: more than 1 MB Number of bits in a word: 64 bits Number of processors used: 1 Distribution format:tar.gz Number of lines in distributed program, including test data, etc.:982 Number of bytes in distributed program, including test data, etc.: 126 051 Nature of physical problem: Reflection high-energy electron diffraction (RHEED) is a very useful technique for studying growth and surface analysis of thin epitaxial structures prepared by the molecular beam epitaxy (MBE). Nowadays, RHEED is used in many laboratories all over the world where researchers deal with the growth of materials by MBE. The RHEED technique can reveal, almost instantaneously, changes either in the coverage of the sample surface by adsorbates or in the surface structure of a thin film. In most cases the interpretation of experimental results is based on the use of dynamical diffraction approaches. Such approaches are said to be quite useful in qualitative and quantitative analysis of RHEED experimental data. Method of solution: RHEED intensities are calculated within the framework of the general matrix formulation of Peng and Whelan [Surf. Sci. Lett. 238 (1990) L446] under the one-beam condition. The dynamical diffraction calculations presented in this paper utilize the systematic reflection case in RHEED, in which the atomic potential in the planes parallel to the surface are projected on the surface normal, so that the results are insensitive to the atomic arrangement in the layers parallel to the surface. This model shows a systematic approximation in calculating dynamical RHEED intensities, and only a layer coverage factor for the nth layer was taken into account in calculating the interaction potential between the fast electron and that layer. Typical running time: The typical running time is machine and user-parameters dependent. Unusual features of the program: The program is presented in the form of a basic unit RHEED.cpp and should be compiled using C++ compilers, including C++ Builder and g++.
1. Aerial view of turnpike path running diagonally up from ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

1. Aerial view of turnpike path running diagonally up from lower left (present-day Orange Turnpike alignment) and containing on towards upper right through tree clump in center of the bare spot on the landscape, and on through the trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
27 CFR 9.212 - Leona Valley.

Code of Federal Regulations, 2010 CFR

2010-04-01

... approximately 0.25 mile to its intersection with a trail and the 3,800-foot elevation line, T6N, R13W; then (9... (21) Proceed north and then generally southeast along the 3,600-foot elevation line that runs parallel... elevation line that runs north of the San Andreas Rift Zone to its intersection with the section 16 east...
27 CFR 9.212 - Leona Valley.

Code of Federal Regulations, 2011 CFR

2011-04-01

... approximately 0.25 mile to its intersection with a trail and the 3,800-foot elevation line, T6N, R13W; then (9... (21) Proceed north and then generally southeast along the 3,600-foot elevation line that runs parallel... elevation line that runs north of the San Andreas Rift Zone to its intersection with the section 16 east...

TIMEDELN: A programme for the detection and parametrization of overlapping resonances using the time-delay method

NASA Astrophysics Data System (ADS)

Little, Duncan A.; Tennyson, Jonathan; Plummer, Martin; Noble, Clifford J.; Sunderland, Andrew G.

2017-06-01

TIMEDELN implements the time-delay method of determining resonance parameters from the characteristic Lorentzian form displayed by the largest eigenvalues of the time-delay matrix. TIMEDELN constructs the time-delay matrix from input K-matrices and analyses its eigenvalues. This new version implements multi-resonance fitting and may be run serially or as a high performance parallel code with three levels of parallelism. TIMEDELN takes K-matrices from a scattering calculation, either read from a file or calculated on a dynamically adjusted grid, and calculates the time-delay matrix. This is then diagonalized, with the largest eigenvalue representing the longest time-delay experienced by the scattering particle. A resonance shows up as a characteristic Lorentzian form in the time-delay: the programme searches the time-delay eigenvalues for maxima and traces resonances when they pass through different eigenvalues, separating overlapping resonances. It also performs the fitting of the calculated data to the Lorentzian form and outputs resonance positions and widths. Any remaining overlapping resonances can be fitted jointly. The branching ratios of decay into the open channels can also be found. The programme may be run serially or in parallel with three levels of parallelism. The parallel code modules are abstracted from the main physics code and can be used independently.
Operating System Abstraction Layer (OSAL)

NASA Technical Reports Server (NTRS)

Yanchik, Nicholas J.

2007-01-01

This viewgraph presentation reviews the concept of the Operating System Abstraction Layer (OSAL) and its benefits. The OSAL is A small layer of software that allows programs to run on many different operating systems and hardware platforms It runs independent of the underlying OS & hardware and it is self-contained. The benefits of OSAL are that it removes dependencies from any one operating system, promotes portable, reusable flight software. It allows for Core Flight software (FSW) to be built for multiple processors and operating systems. The presentation discusses the functionality, the various OSAL releases, and describes the specifications.
An asymptotic induced numerical method for the convection-diffusion-reaction equation

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.; Sorensen, Danny C.

1988-01-01

A parallel algorithm for the efficient solution of a time dependent reaction convection diffusion equation with small parameter on the diffusion term is presented. The method is based on a domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. Parallelism is evident at two levels. Domain decomposition provides parallelism at the highest level, and within each domain there is ample opportunity to exploit parallelism. Run time results demonstrate the viability of the method.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
A C++ Thread Package for Concurrent and Parallel Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jie Chen; William Watson

1999-11-01

Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.
drPACS: A Simple UNIX Execution Pipeline

NASA Astrophysics Data System (ADS)

Teuben, P.

2011-07-01

We describe a very simple yet flexible and effective pipeliner for UNIX commands. It creates a Makefile to define a set of serially dependent commands. The commands in the pipeline share a common set of parameters by which they can communicate. Commands must follow a simple convention to retrieve and store parameters. Pipeline parameters can optionally be made persistent across multiple runs of the pipeline. Tools were added to simplify running a large series of pipelines, which can then also be run in parallel.
Muscle architecture of the elongated nose in the Asian elephant (Elephas maximus).

PubMed

Endo, H; Hayashi, Y; Komiya, T; Narushima, E; Sasaki, M

2001-05-01

The architecture of the M. caninus in the elongated nose was examined in the Asian elephant (Elephas maximus). The following complicated musculature of the M. caninus was observed in the proximal and distal regions of the nose: (1) Proximal region: In the superficial layer, the longitudinal bundles are confirmed in the dorsal part, and the obliquely-oriented ones in the ventral part. In the middle layer, some bundles run ventro-distally, while other ones represent longitudinally-oriented running. The deep layer consists of complicated architecture of many bundles. Some muscle bundles run medio-laterally, while the others extend proximo-distally in this space. (2) Distal region: In the dorsal part of the M. caninus, the bundles run at deep-superficial direction, while in the ventral part the bundles are longitudinally arranged. The bundles run at lateral direction near the septum of the nasal conduits. The N. facialis and N. infraorbitalis send many branches in the lateral area of the M. caninus in the trunk. This muscle architecture of multi-oriented bundles and well-developed innervation to them suggest that they enable the elongated nose to act as a refined manipulator in the Asian elephant.
Two-axis magnetic field sensor

NASA Technical Reports Server (NTRS)

Smith, Carl H. (Inventor); Nordman, Catherine A. (Inventor); Jander, Albrecht (Inventor); Qian, Zhenghong (Inventor)

2006-01-01

A ferromagnetic thin-film based magnetic field sensor with first and second sensitive direction sensing structures each having a nonmagnetic intermediate layer with two major surfaces on opposite sides thereof having a magnetization reference layer on one and an anisotropic ferromagnetic material sensing layer on the other having a length in a selected length direction and a smaller width perpendicular thereto and parallel to the relatively fixed magnetization direction. The relatively fixed magnetization direction of said magnetization reference layer in each is oriented in substantially parallel to the substrate but substantially perpendicular to that of the other. An annealing process is used to form the desired magnetization directions.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Steady Boundary Layer Disturbances Created By Two-Dimensional Surface Ripples

NASA Astrophysics Data System (ADS)

Kuester, Matthew

2017-11-01

Multiple experiments have shown that surface roughness can enhance the growth of Tollmien-Schlichting (T-S) waves in a laminar boundary layer. One of the common observations from these studies is a ``wall displacement'' effect, where the boundary layer profile shape remains relatively unchanged, but the origin of the profile pushes away from the wall. The objective of this work is to calculate the steady velocity field (including this wall displacement) of a laminar boundary layer over a surface with small, 2D surface ripples. The velocity field is a combination of a Blasius boundary layer and multiple disturbance modes, calculated using the linearized Navier-Stokes equations. The method of multiple scales is used to include non-parallel boundary layer effects of O (Rδ- 1) ; the non-parallel terms are necessary, because a wall displacement is mathematically inconsistent with a parallel boundary layer assumption. This technique is used to calculate the steady velocity field over ripples of varying height and wavelength, including cases where a separation bubble forms on the leeward side of the ripple. In future work, the steady velocity field will be the input for stability calculations, which will quantify the growth of T-S waves over rough surfaces. The author would like to acknowledge the support of the Kevin T. Crofton Aerospace & Ocean Engineering Department at Virginia Tech.
Interactions among Radiation, Convection, and Large-Scale Dynamics in a General Circulation Model.

NASA Astrophysics Data System (ADS)

Randall, David A.; Harshvardhan; Dazlich, Donald A.; Corsetti, Thomas G.

1989-07-01

We have analyzed the effects of radiatively active clouds on the climate simulated by the UCLA/GLA GCM, with particular attention to the effects of the upper tropospheric stratiform clouds associated with deep cumulus convection, and the interactions of these clouds with convection and the large-scale circulation.Several numerical experiments have been performed to investigate the mechanisms through which the clouds influence the large-scale circulation. In the `NODETLQ' experiment, no liquid water or ice was detrained from cumulus clouds into the environment; all of the condensate was rained out. Upper level supersaturation cloudiness was drastically reduced, the atmosphere dried, and tropical outgoing longwave radiation increased. In the `NOANVIL' experiment, the radiative effects of the optically thich upper-level cloud sheets associated with deep cumulus convection were neglected. The land surface received more solar radiation in regions of convection, leading to enhanced surface fluxes and a dramatic increase in precipitation. In the `NOCRF' experiment, the longwave atmospheric cloud radiative forcing (ACRF) was omitted, paralleling the recent experiment of Slingo and Slingo. The results suggest that the ACRF enhances deep penetrative convection and precipitation, while suppressing shallow convection. They also indicate that the ACRF warms and moistens the tropical troposphere. The results of this experiment are somewhat ambiguous, however; for example, the ACRF suppresses precipitation in some parts of the tropics, and enhances it in others.To isolate the effects of the ACRF in a simpler setting, we have analyzed the climate of an ocean-covered Earth, which we call Seaworld. The key simplicities of Seaworld are the fixed boundary temperature with no land points, the lack of mountains, and the zonal uniformity of the boundary conditions. Results are presented from two Seaworld simulations. The first includes a full suite of physical parameterizations, while the second omits all radiative effects of the clouds. The differences between the two runs are, therefore, entirely due to the direct and indirect and indirect effects of the ACRF. Results show that the ACRF in the cloudy run accurately represents the radiative heating perturbation relative to the cloud-free run. The cloudy run is warmer in the middle troposphere, contains much more precipitable water, and has about 15% more globally averaged precipitation. There is a double tropical rain band in the cloud-free run, and a single, more intense tropical rain band in the cloudy run. The cloud-free run produces relatively weak but frequent cumulus convection, while the cloudy run produces relatively intense but infrequent convection. The mean meridional circulation transport nearly twice as much mass in the cloudy run. The increased tropical rising motion in the cloudy run leads to a deeper boundary layer and also to more moisture in the troposphere above the boundary layer. This accounts for the increased precipitable water content of the atmosphere. The clouds lead to an increase in the intensity of the tropical easterlies, and cause the midlatitude westerly jets to shift equatorward.Taken together, our results show that upper tropospheric clouds associated with moist convection, whose importance has recently been emphasized in observational studies, play a very complex and powerful role in determining the model results. This points to a need to develop more realistic parameterizations of these clouds.
Laboratory duplication of comb layering in the Rhum pluton. [igneous rocks with comb layered texture

NASA Technical Reports Server (NTRS)

Donaldson, C. H.

1977-01-01

A description is provided of the texture of harrisite comb layers, taking into account the results of crystallization experiments at controlled cooling rates, which have reproduced the textural change from 'cumulate' to comb-layered harrisite. Melted samples of harrisite were used in the dynamic crystallization experiments considered. The differentiation of a cooling rate run with respect to olivine grain size and shape is shown and three possible origins of hopper olivine in differentiated crystallization runs are considered. It is found that olivine nucleation occurred throughout cooling, except for the incubation period during early cooling. The elongate combed olivines in harrisite apparently grew as the magma locally supercooled to at least 30 C. It is suggested that the branching crystals in most comb layers, including comb-layered harrisite, probably grew along thermal gradients.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Lg Attenuation Anisotropy Across the Western US

NASA Astrophysics Data System (ADS)

Phillips, W. S.; Rowe, C. A.; Stead, R. J.; Begnaud, M. L.

2017-12-01

The USArray has allowed us to map seismic attenuation of local and regional phases to unprecedented spatial extent and resolution. Following standard mantle Pn velocity anisotropy methods, we have incorporated azimuthal anisotropy into our tomographic inversion of high-frequency Lg amplitudes. The Lg is a crustal shear phase made up of many trapped modes, thus results can be considered to be crustal averages. Azimuthal anisotropy reduces residual variance by just over 10% for 1.5-3 Hz Lg. We observe a median anisotropic variation of 12%, and a high of 50% in the Salton Trough. Low attenuation (high-Q) directions run parallel to topographic fabric and major strike slip faults in tectonically active areas, and often run parallel to mantle shear wave splitting directions in stable regions. Tradeoffs are of concern, and synthetic tests show that elongated attenuation anomalies will produce anisotropy artifacts, but of factors 2-3 times lower than observations. In particular, the strength of a long, narrow high-Q anomaly will trade off with high-Q directions parallel to the long axis, while an elongated low-Q anomaly will trade off with high-Q directions perpendicular to the long axis. We observe an elongated low-Q anomaly associated with the Walker Lane; however, observed high-Q directions run parallel to the long axis of this anomaly, opposite to the tradeoff effect, supporting the anisotropic observation, and implying that the effect may be underestimated. Further, we observe an elongated high-Q anomaly associated with the Great Valley and Sierra Nevada that runs across the long axis, again opposite to the tradeoff effect. This study was performed using waveforms, event locations and phase picks made available by IRIS, NEIC and ANF, and processing was done using semi-automated means, thus this is a technique that can be applied quickly to study crustal anisotropy over large areas when appropriate station density is available.
Multilayer insulation blanket, fabricating apparatus and method

DOEpatents

Gonczy, John D.; Niemann, Ralph C.; Boroski, William N.

1992-01-01

An improved multilayer insulation blanket for insulating cryogenic structures operating at very low temperatures is disclosed. An apparatus and method for fabricating the improved blanket are also disclosed. In the improved blanket, each successive layer of insulating material is greater in length and width than the preceding layer so as to accommodate thermal contraction of the layers closest to the cryogenic structure. The fabricating apparatus has a rotatable cylindrical mandrel having an outer surface of fixed radius that is substantially arcuate, preferably convex, in cross-section. The method of fabricating the improved blanket comprises (a) winding a continuous sheet of thermally reflective material around the circumference of the mandrel to form multiple layers, (b) binding the layers along two lines substantially parallel to the edges of the circumference of the mandrel, (c) cutting the layers along a line parallel to the axle of the mandrel, and (d) removing the bound layers from the mandrel.
Method of fabricating a multilayer insulation blanket

DOEpatents

Gonczy, John D.; Niemann, Ralph C.; Boroski, William N.

1993-01-01

An improved multilayer insulation blanket for insulating cryogenic structures operating at very low temperatures is disclosed. An apparatus and method for fabricating the improved blanket are also disclosed. In the improved blanket, each successive layer of insulating material is greater in length and width than the preceding layer so as to accommodate thermal contraction of the layers closest to the cryogenic structure. The fabricating apparatus has a rotatable cylindrical mandrel having an outer surface of fixed radius that is substantially arcuate, preferably convex, in cross-section. The method of fabricating the improved blanket comprises (a) winding a continuous sheet of thermally reflective material around the circumference of the mandrel to form multiple layers, (b) binding the layers along two lines substantially parallel to the edges of the circumference of the mandrel, (c) cutting the layers along a line parallel to the axle of the mandrel, and (d) removing the bound layers from the mandrel.
Method of fabricating a multilayer insulation blanket

DOEpatents

Gonczy, J.D.; Niemann, R.C.; Boroski, W.N.

1993-07-06

An improved multilayer insulation blanket for insulating cryogenic structures operating at very low temperatures is disclosed. An apparatus and method for fabricating the improved blanket are also disclosed. In the improved blanket, each successive layer of insulating material is greater in length and width than the preceding layer so as to accommodate thermal contraction of the layers closest to the cryogenic structure. The fabricating apparatus has a rotatable cylindrical mandrel having an outer surface of fixed radius that is substantially arcuate, preferably convex, in cross-section. The method of fabricating the improved blanket comprises (a) winding a continuous sheet of thermally reflective material around the circumference of the mandrel to form multiple layers, (b) binding the layers along two lines substantially parallel to the edges of the circumference of the mandrel, (c) cutting the layers along a line parallel to the axle of the mandrel, and (d) removing the bound layers from the mandrel.
Multilayer insulation blanket, fabricating apparatus and method

DOEpatents

Gonczy, J.D.; Niemann, R.C.; Boroski, W.N.

1992-09-01

An improved multilayer insulation blanket for insulating cryogenic structures operating at very low temperatures is disclosed. An apparatus and method for fabricating the improved blanket are also disclosed. In the improved blanket, each successive layer of insulating material is greater in length and width than the preceding layer so as to accommodate thermal contraction of the layers closest to the cryogenic structure. The fabricating apparatus has a rotatable cylindrical mandrel having an outer surface of fixed radius that is substantially arcuate, preferably convex, in cross-section. The method of fabricating the improved blanket comprises (a) winding a continuous sheet of thermally reflective material around the circumference of the mandrel to form multiple layers, (b) binding the layers along two lines substantially parallel to the edges of the circumference of the mandrel, (c) cutting the layers along a line parallel to the axle of the mandrel, and (d) removing the bound layers from the mandrel. 7 figs.
Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; Jong, Wibe de

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; de Jong, Wibe

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Collisionless slow shocks in magnetotail reconnection

NASA Astrophysics Data System (ADS)

Cremer, Michael; Scholer, Manfred

The kinetic structure of collisionless slow shocks in the magnetotail is studied by solving the Riemann problem of the collapse of a current sheet with a normal magnetic field component using 2-D hybrid simulations. The collapse results in a current layer with a hot isotropic distribution and backstreaming ions in a boundary layer. The lobe plasma outside and within the boundary layer exhibits a large perpendicular to parallel temperature anisotropy. Waves in both regions propagate parallel to the magnetic field. In a second experiment a spatially limited high density beam is injected into a low beta background plasma and the subsequent wave excitation is studied. A model for slow shocks bounding the reconnection layer in the magnetotail is proposed where backstreaming ions first excite obliquely propagating waves by the electromagnetic ion/ion cyclotron instability, which lead to perpendicular heating. The T⊥/T∥ temperature anisotropy subsequently excites parallel propagating Alfvén ion cyclotron waves, which are convected into the slow shock and are refracted in the downstream region.
An S3-3 search for confined regions of large parallel electric fields

NASA Astrophysics Data System (ADS)

Boehm, M. H.; Mozer, F. S.

1981-06-01

S3-3 satellite passes through several hundred perpendicular shocks are searched for evidence of large, mostly parallel electric fields (several hundred millivolts per meter, total potential of several kilo-volts) in the auroral zone magnetosphere at altitudes of several thousand kilometers. The actual search criteria are that one or more E-field data points have a parallel component E sub z greater than 350 mV/m in general, or 100 mV/m for data within 10 seconds of a perpendicular shock, since double layers might be likely, in such regions. Only a few marginally convincing examples of the electric fields are found, none of which fits a double layer model well. From statistics done with the most unbiased part of the data set, upper limits are obtained on the number and size of double layers occurring in the auroral zone magnetosphere, and it is concluded that the double layers most probably cannot be responsible for the production of diffuse aurora or inverted-V events.
VAC: Versatile Advection Code

NASA Astrophysics Data System (ADS)

Tóth, Gábor; Keppens, Rony

2012-07-01

The Versatile Advection Code (VAC) is a freely available general hydrodynamic and magnetohydrodynamic simulation software that works in 1, 2 or 3 dimensions on Cartesian and logically Cartesian grids. VAC runs on any Unix/Linux system with a Fortran 90 (or 77) compiler and Perl interpreter. VAC can run on parallel machines using either the Message Passing Interface (MPI) library or a High Performance Fortran (HPF) compiler.
Large-scale trench-normal mantle flow beneath central South America

NASA Astrophysics Data System (ADS)

Reiss, M. C.; Rümpker, G.; Wölbern, I.

2018-01-01

We investigate the anisotropic properties of the fore-arc region of the central Andean margin between 17-25°S by analyzing shear-wave splitting from teleseismic and local earthquakes from the Nazca slab. With partly over ten years of recording time, the data set is uniquely suited to address the long-standing debate about the mantle flow field at the South American margin and in particular whether the flow field beneath the slab is parallel or perpendicular to the trench. Our measurements suggest two anisotropic layers located within the crust and mantle beneath the stations, respectively. The teleseismic measurements show a moderate change of fast polarizations from North to South along the trench ranging from parallel to subparallel to the absolute plate motion and, are oriented mostly perpendicular to the trench. Shear-wave splitting measurements from local earthquakes show fast polarizations roughly aligned trench-parallel but exhibit short-scale variations which are indicative of a relatively shallow origin. Comparisons between fast polarization directions from local earthquakes and the strike of the local fault systems yield a good agreement. To infer the parameters of the lower anisotropic layer we employ an inversion of the teleseismic waveforms based on two-layer models, where the anisotropy of the upper (crustal) layer is constrained by the results from the local splitting. The waveform inversion yields a mantle layer that is best characterized by a fast axis parallel to the absolute plate motion which is more-or-less perpendicular to the trench. This orientation is likely caused by a combination of the fossil crystallographic preferred orientation of olivine within the slab and entrained mantle flow beneath the slab. The anisotropy within the crust of the overriding continental plate is explained by the shape-preferred orientation of micro-cracks in relation to local fault zones which are oriented parallel to the overall strike of the Andean range. Our results do not provide any evidence for a significant contribution of trench-parallel mantle flow beneath the subducting slab.
Decision tables and rule engines in organ allocation systems for optimal transparency and flexibility.

PubMed

Schaafsma, Murk; van der Deijl, Wilfred; Smits, Jacqueline M; Rahmel, Axel O; de Vries Robbé, Pieter F; Hoitsma, Andries J

2011-05-01

Organ allocation systems have become complex and difficult to comprehend. We introduced decision tables to specify the rules of allocation systems for different organs. A rule engine with decision tables as input was tested for the Kidney Allocation System (ETKAS). We compared this rule engine with the currently used ETKAS by running 11,000 historical match runs and by running the rule engine in parallel with the ETKAS on our allocation system. Decision tables were easy to implement and successful in verifying correctness, completeness, and consistency. The outcomes of the 11,000 historical matches in the rule engine and the ETKAS were exactly the same. Running the rule engine simultaneously in parallel and in real time with the ETKAS also produced no differences. Specifying organ allocation rules in decision tables is already a great step forward in enhancing the clarity of the systems. Yet, using these tables as rule engine input for matches optimizes the flexibility, simplicity and clarity of the whole process, from specification to the performed matches, and in addition this new method allows well controlled simulations. © 2011 The Authors. Transplant International © 2011 European Society for Organ Transplantation.
Streaming data analytics via message passing with application to graph algorithms

DOE PAGES

Plimpton, Steven J.; Shead, Tim

2014-05-06

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
Gorgonum Chaos

NASA Technical Reports Server (NTRS)

2002-01-01

(Released 08 April 2002) This image shows the cratered highlands of Terra Sirenum in the southern hemisphere. Near the center of the image running from left to right one can see long parallel to semi-parallel fractures or troughs called graben. Mars Global Surveyor initially discovered gullies on the south-facing wall of these fractures. This image is located at 38oS, 174oW (186oE).
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Parallel design of JPEG-LS encoder on graphics processing units

NASA Astrophysics Data System (ADS)

Duan, Hao; Fang, Yong; Huang, Bormin

2012-01-01

With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
A comparison of energetic ions in the plasma depletion layer and the quasi-parallel magnetosheath

NASA Technical Reports Server (NTRS)

Fuselier, Stephen A.

1994-01-01

Energetic ion spectra measured by the Active Magnetospheric Particle Tracer Explorers/Charge Composition Explorer (AMPTE/CCE) downstream from the Earth's quasi-parallel bow shock (in the quasi-parallel magnetosheath) and in the plasma depletion layer are compared. In the latter region, energetic ions are from a single source, leakage of magnetospheric ions across the magnetopause and into the plasma depletion layer. In the former region, both the magnetospheric source and shock acceleration of the thermal solar wind population at the quasi-parallel shock can contribute to the energetic ion spectra. The relative strengths of these two energetic ion sources are determined through the comparison of spectra from the two regions. It is found that magnetospheric leakage can provide an upper limit of 35% of the total energetic H(+) population in the quasi-parallel magnetosheath near the magnetopause in the energy range from approximately 10 to approximately 80 keV/e and substantially less than this limit for the energetic He(2+) population. The rest of the energetic H(+) population and nearly all of the energetic He(2+) population are accelerated out of the thermal solar wind population through shock acceleration processes. By comparing the energetic and thermal He(2+) and H(+) populations in the quasi-parallel magnetosheath, it is found that the quasi-parallel bow shock is 2 to 3 times more efficient at accelerating He(2+) than H(+). This result is consistent with previous estimates from shock acceleration theory and simulati ons.
New neurons generated from running are broadly recruited into neuronal activation associated with three different hippocampus-involved tasks

PubMed Central

Clark, Peter J.; Bhattacharya, Tushar K.; Miller, Daniel S.; Kohman, Rachel A.; DeYoung, Erin K.; Rhodes, Justin S.

2012-01-01

Running increases the formation of new neurons in the adult rodent hippocampus. However, the function of new neurons generated from running is currently unknown. One hypothesis is that new neurons from running contribute to enhanced cognitive function by increasing plasticity in the adult hippocampus. An alternative hypothesis is that new neurons generated from running incorporate into experience-specific hippocampal networks that only become active during running. The purpose of this experiment was to determine if new neurons generated from running are selectively activated by running, or can become recruited into granule cell activity occurring during performance on other behavioral tasks that engage the hippocampus. Therefore, the activation of new 5–6 week neurons was detected using BrdU, NeuN, and Zif268 triple-label immunohistochemistry in cohorts of female running and sedentary adult C57BL/6J mice following participation in one of three different tasks: the Morris water maze, novel environment exploration, or wheel running. Results showed that running and sedentary mice displayed a nearly equivalent proportion of new neurons that expressed Zif268 following each task. Since running approximately doubled the number of new neurons, the results demonstrated that running mice had a greater number of new neurons recruited into the Zif268 induction in the granule cell layer following each task than sedentary mice. The results suggest that new neurons incorporated into hippocampal circuitry from running are not just activated by wheel running itself, but rather become broadly recruited into granule cell layer activity during distinct behavioral experiences. PMID:22467337
A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine.

PubMed

Duan, Mingxing; Li, Kenli; Liao, Xiangke; Li, Keqin

2018-06-01

As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix decomposition algorithm, and matrix decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an speedup on a cluster with ten nodes, and reaches a speedup with 15 nodes, an speedup with 20 nodes, a speedup with 25 nodes, a speedup with 30 nodes, and a speedup with 35 nodes.
Magnetospheric Multiscale Satellites Observations of Parallel Electric Fields Associated with Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Ergun, R. E.; Goodrich, K. A.; Wilder, F. D.; Holmes, J. C.; Stawarz, J. E.; Eriksson, S.; Sturner, A. P.; Malaspina, D. M.; Usanova, M. E.; Torbert, R. B.;

2016-01-01

We report observations from the Magnetospheric Multiscale satellites of parallel electric fields (E (sub parallel)) associated with magnetic reconnection in the subsolar region of the Earth's magnetopause. E (sub parallel) events near the electron diffusion region have amplitudes on the order of 100 millivolts per meter, which are significantly larger than those predicted for an antiparallel reconnection electric field. This Letter addresses specific types of E (sub parallel) events, which appear as large-amplitude, near unipolar spikes that are associated with tangled, reconnected magnetic fields. These E (sub parallel) events are primarily in or near a current layer near the separatrix and are interpreted to be double layers that may be responsible for secondary reconnection in tangled magnetic fields or flux ropes. These results are telling of the three-dimensional nature of magnetopause reconnection and indicate that magnetopause reconnection may be often patchy and/or drive turbulence along the separatrix that results in flux ropes and/or tangled magnetic fields.

Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

1995-01-01

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
50 GFlops molecular dynamics on the Connection Machine 5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lomdahl, P.S.; Tamayo, P.; Groenbech-Jensen, N.

1993-12-31

The authors present timings and performance numbers for a new short range three dimensional (3D) molecular dynamics (MD) code, SPaSM, on the Connection Machine-5 (CM-5). They demonstrate that runs with more than 10{sup 8} particles are now possible on massively parallel MIMD computers. To the best of their knowledge this is at least an order of magnitude more particles than what has previously been reported. Typical production runs show sustained performance (including communication) in the range of 47--50 GFlops on a 1024 node CM-5 with vector units (VUs). The speed of the code scales linearly with the number of processorsmore » and with the number of particles and shows 95% parallel efficiency in the speedup.« less
Implementation of the force decomposition machine for molecular dynamics simulations.

PubMed

Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

2012-09-01

We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.

Grace: A cross-platform micromagnetic simulator on graphics processing units

NASA Astrophysics Data System (ADS)

Zhu, Ru

2015-12-01

A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

PubMed

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
The light wave flow effect in a plane-parallel layer with a quasi-zero refractive index under the action of bounded light beams

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gadomsky, O. N., E-mail: gadomsky@mail.ru; Shchukarev, I. A., E-mail: blacxpress@gmail.com

2016-08-15

It is shown that external optical radiation in the 450–1200 nm range can be efficiently transformed under the action of bounded light beams to a surface wave that propagates along the external and internal boundaries of a plane-parallel layer with a quasi-zero refractive index. Reflection regimes with complex and real angles of refraction in the layer are considered. The layer with a quasi-zero refractive index in this boundary problem is located on a highly reflective metal substrate; it is shown that the uniform low reflection of light is achieved in the wavelength range under study.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

NASA Astrophysics Data System (ADS)

Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

2011-12-01

With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Distributed run of a one-dimensional model in a regional application using SOAP-based web services

NASA Astrophysics Data System (ADS)

Smiatek, Gerhard

This article describes the setup of a distributed computing system in Perl. It facilitates the parallel run of a one-dimensional environmental model on a number of simple network PC hosts. The system uses Simple Object Access Protocol (SOAP) driven web services offering the model run on remote hosts and a multi-thread environment distributing the work and accessing the web services. Its application is demonstrated in a regional run of a process-oriented biogenic emission model for the area of Germany. Within a network consisting of up to seven web services implemented on Linux and MS-Windows hosts, a performance increase of approximately 400% has been reached compared to a model run on the fastest single host.
Nanoscale lamellar photoconductor hybrids and methods of making same

DOEpatents

Stupp, Samuel I; Goldberger, Josh; Sofos, Marina

2013-02-05

An article of manufacture and methods of making same. In one embodiment, the article of manufacture has a plurality of zinc oxide layers substantially in parallel, wherein each zinc oxide layer has a thickness d.sub.1, and a plurality of organic molecule layers substantially in parallel, wherein each organic molecule layer has a thickness d.sub.2 and a plurality of molecules with a functional group that is bindable to zinc ions, wherein for every pair of neighboring zinc oxide layers, one of the plurality of organic molecule layers is positioned in between the pair of neighboring zinc oxide layers to allow the functional groups of the plurality of organic molecules to bind to zinc ions in the neighboring zinc oxide layers to form a lamellar hybrid structure with a geometric periodicity d.sub.1+d.sub.2, and wherein d.sub.1 and d.sub.2 satisfy the relationship of d.sub.1.ltoreq.d.sub.2.ltoreq.3d.sub.1.
Scalable and balanced dynamic hybrid data assimilation

NASA Astrophysics Data System (ADS)

Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

2017-04-01

Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
Implementing Shared Memory Parallelism in MCBEND

NASA Astrophysics Data System (ADS)

Bird, Adam; Long, David; Dobson, Geoff

2017-09-01

MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Porting LAMMPS to GPUs.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, William Michael; Plimpton, Steven James; Wang, Peng

2010-03-01

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
MBE growth of highly reproducible VCSELs

NASA Astrophysics Data System (ADS)

Houng, Y. M.; Tan, M. R. T.

1997-05-01

Advances in the design of heterojunction devices have placed stringent demands on the epitaxial material technologies required to fabricate these structures. The increased demand for more stringent tolerance and complex device structures have resulted in a situation where acceptable growth yields will be realized only if epitaxial growth is directly monitored and controlled in real time. We report the growth of 980- and 850-nm vertical cavity surface emitting lasers (VCSEL's) by gas-source molecular beam epitaxy (GSMBE), in which the pyrometric interferometry technique is used for in situ monitoring and feedback control of layer thickness to obtain the highly reproducible distributed Bragg reflectors (DBR) for VCSEL structures. This technique uses an optical pyrometer to measure emissivity oscillations of the growing epi-layer surface. The growing layer thickness can then be related to the emissivity oscillation signals. When the layer reaches the desired thickness, the growth of the subsequent layer is initiated. By making layer thickness measurements and control in real-time throughout the entire growth cycle of the structure, the Fabry-Perot resonance at the desired wavelength is reproducibly obtained. The run-to-run variation of the Fabry-Perot wavelength of VCSEL structures is < ± 0.4%. Using this technique, the group III fluxes can also be calibrated and corrected for flux drifts, thus we are able to control the gain peak of the active region with a run-to-run variation of less than 0.3%. Surface emitting laser diodes were fabricated and operated CW at room temperature. CW threshold currents of 3 and 5 mA are measured at room temperature for 980- and 850-nm lasers, respectively. Output powers higher than 25 mW for 980-nm and 12 mW for 850-nm devices are obtained.
Microstructure anisotropy and its effect on mechanical properties of reduced activation ferritic/martensitic steel fabricated by selective laser melting

NASA Astrophysics Data System (ADS)

Huang, Bo; Zhai, Yutao; Liu, Shaojun; Mao, Xiaodong

2018-03-01

Selective laser melting (SLM) is a promising way for the fabrication of complex reduced activation ferritic/martensitic steel components. The microstructure of the SLM built China low activation martensitic (CLAM) steel plates was observed and analyzed. The hardness, Charpy impact and tensile testing of the specimens in different orientations were performed at room temperature. The results showed that the difference in the mechanical properties was related to the anisotropy in microstructure. The planer unmelted porosity in the interface of the adjacent layers induced opening/tensile mode when the tensile samples parallel to the build direction were tested whereas the samples vertical to the build direction fractured in the shear mode with the grains being sheared in a slant angle. Moreover, the impact absorbed energy (IAE) of all impact specimens was significantly lower than that of the wrought CLAM steel, and the IAE of the samples vertical to the build direction was higher than that of the samples parallel to the build direction. The impact fracture surfaces revealed that the load parallel to the build layers caused laminated tearing among the layers, and the load vertical to the layers induced intergranular fracture across the layers.
Distributed computing feasibility in a non-dedicated homogeneous distributed system

NASA Technical Reports Server (NTRS)

Leutenegger, Scott T.; Sun, Xian-He

1993-01-01

The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
Report to the High Order Language Working Group (HOLWG)

DTIC Science & Technology

1977-01-14

as running, runnable, suspended or dormant, may be synchronized by semaphore variables, may be schedaled using clock and duration data types and mpy...Recursive and non-recursive routines G6. Parallel processes, synchronization , critical regions G7. User defined parameterized exception handling G8...typed and lacks extensibility, parallel processing, synchronization and real-time features. Overall Evaluation IBM strongly recommended PL/I as a
Evaluating SPLASH-2 Applications Using MapReduce

NASA Astrophysics Data System (ADS)

Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu

MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
Automatic Adaptation of Tunable Distributed Applications

DTIC Science & Technology

2001-01-01

size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language
Simulation framework for intelligent transportation systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ewing, T.; Doss, E.; Hanebutte, U.

1996-10-01

A simulation framework has been developed for a large-scale, comprehensive, scaleable simulation of an Intelligent Transportation System (ITS). The simulator is designed for running on parallel computers and distributed (networked) computer systems, but can run on standalone workstations for smaller simulations. The simulator currently models instrumented smart vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide two-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphicalmore » user interfaces to support human-factors studies. Realistic modeling of variations of the posted driving speed are based on human factors studies that take into consideration weather, road conditions, driver personality and behavior, and vehicle type. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on parallel computers, such as ANL`s IBM SP-2, for large-scale problems. A novel feature of the approach is that vehicles are represented by autonomous computer processes which exchange messages with other processes. The vehicles have a behavior model which governs route selection and driving behavior, and can react to external traffic events much like real vehicles. With this approach, the simulation is scaleable to take advantage of emerging massively parallel processor (MPP) systems.« less

AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

NASA Astrophysics Data System (ADS)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Simulating three dimensional wave run-up over breakwaters covered by antifer units

NASA Astrophysics Data System (ADS)

Najafi-Jilani, A.; Niri, M. Zakiri; Naderi, Nader

2014-06-01

The paper presents the numerical analysis of wave run-up over rubble-mound breakwaters covered by antifer units using a technique integrating Computer-Aided Design (CAD) and Computational Fluid Dynamics (CFD) software. Direct application of Navier-Stokes equations within armour blocks, is used to provide a more reliable approach to simulate wave run-up over breakwaters. A well-tested Reynolds-averaged Navier-Stokes (RANS) Volume of Fluid (VOF) code (Flow-3D) was adopted for CFD computations. The computed results were compared with experimental data to check the validity of the model. Numerical results showed that the direct three dimensional (3D) simulation method can deliver accurate results for wave run-up over rubble mound breakwaters. The results showed that the placement pattern of antifer units had a great impact on values of wave run-up so that by changing the placement pattern from regular to double pyramid can reduce the wave run-up by approximately 30%. Analysis was done to investigate the influences of surface roughness, energy dissipation in the pores of the armour layer and reduced wave run-up due to inflow into the armour and stone layer.
Reduction Characteristics of FM-Band Cross-Talks between Two Parallel Signal Traces on Printed Circuit Boards for Vehicles

NASA Astrophysics Data System (ADS)

Maeno, Tsuyoshi; Ueyama, Hiroya; Iida, Michihira; Fujiwara, Osamu

It is well known that electromagnetic disturbances in vehicle-mounted radios are mainly caused by conducted noise currents flowing through wiring-harnesses from vehicle-mounted printed circuit boards (PCBs) with common ground patterns with slits. To suppress the noise current outflows from the PCBs of this kind, we previously measured noise current outflows from simple two-layer PCBs having two parallel signal traces and different ground patterns with/without slits, which revealed that making slits with open ends on the ground patterns in parallel with the traces can reduce the conducted noise currents. In the present study, with the FDTD simulation, we investigated reduction characteristics of the FM-band cross-talk noise levels between two parallel signal traces for eighteen PCBs, which have different ground patterns with/without slits parallel to the traces and dielectric layers with different thickness. As a result, we found that the cross-talk reduction effect due to slits is obtained by 3.6-5.3dB, while the cross-talks between signal traces are reduced in inverse proportion to the square of the dielectric-layer thickness and in proportion to the square of the trace interval and, which can quantitatively be explained from an inductive coupling theory.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel-In-Time For Moving Meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Hypersonic Boundary Layer Instability Over a Corner

NASA Technical Reports Server (NTRS)

Balakumar, Ponnampalam; Zhao, Hong-Wu; McClinton, Charles (Technical Monitor)

2001-01-01

A boundary-layer transition study over a compression corner was conducted under a hypersonic flow condition. Due to the discontinuities in boundary layer flow, the full Navier-Stokes equations were solved to simulate the development of disturbance in the boundary layer. A linear stability analysis and PSE method were used to get the initial disturbance for parallel and non-parallel flow respectively. A 2-D code was developed to solve the full Navier-stokes by using WENO(weighted essentially non-oscillating) scheme. The given numerical results show the evolution of the linear disturbance for the most amplified disturbance in supersonic and hypersonic flow over a compression ramp. The nonlinear computations also determined the minimal amplitudes necessary to cause transition at a designed location.
Neoclassical, semi-collisional tearing mode theory in an axisymmetric torus

NASA Astrophysics Data System (ADS)

Connor, J. W.; Hastie, R. J.; Helander, P.

2017-12-01

A set of layer equations for determining the stability of semi-collisional tearing modes in an axisymmetric torus, incorporating neoclassical physics, in the small ion Larmor radius limit, is provided. These can be used as an inner layer module for inclusion in numerical codes that asymptotically match the layer to toroidal calculations of the tearing mode stability index, \\prime $ . They are more complete than in earlier work and comprise equations for the perturbed electron density and temperature, the ion temperature, Ampère's law and the vorticity equation, amounting to a twelvth-order set of radial differential equations. While the toroidal geometry is kept quite general when treating the classical and Pfirsch-Schlüter transport, parallel bootstrap current and semi-collisional physics, it is assumed that the fraction of trapped particles is small for the banana regime contribution. This is to justify the use of a model collision term when acting on the localised (in velocity space) solutions that remain after the Spitzer solutions have been exploited to account for the bulk of the passing distributions. In this respect, unlike standard neoclassical transport theory, the calculation involves the second Spitzer solution connected with a parallel temperature gradient, because this stability problem involves parallel temperature gradients that cannot occur in equilibrium toroidal transport theory. Furthermore, a calculation of the linearised neoclassical radial transport of toroidal momentum for general geometry is required to complete the vorticity equation. The solutions of the resulting set of equations do not match properly to the ideal magnetohydrodynamic (MHD) equations at large distances from the layer, and a further, intermediate layer involving ion corrections to the electrical conductivity and ion parallel thermal transport is invoked to achieve this matching and allow one to correctly calculate the layer \\prime $ .
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, whilemore » we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.« less
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Fault gouge rheology under confined, high-velocity conditions

NASA Astrophysics Data System (ADS)

Reches, Z.; Madden, A. S.; Chen, X.

2012-12-01

We recently developed the experimental capability to investigate the shear properties of fine-grain gouge under confined conditions and high-velocity. The experimental system includes a rotary apparatus that can apply large displacements of tens of meters, slip velocity of 0.001- 2.0 m/s, and normal stress of 35 MPa (Reches and Lockner, 2010). The key new component is a Confined ROtary Cell (CROC) that can shear a gouge layer either dry or under pore-pressure. The pore pressure is controlled by two syringe pumps. CROC includes a ring-shape gouge chamber of 62.5 mm inner diameter, 81.25 mm outer diameter, and up to 3 mm thick gouge sample. The lower, rotating part of CROC contains the sample chamber, and the upper, stationary part includes the loading, hollow cylinder and setting for temperature, and dilation measurements, and pore-pressure control. Each side of the gouge chamber has two pairs of industrial, spring-energized, self-lubricating, teflon-graphite seals, built for particle media and can work at temperature up to 250 ded C. The space between each of the two sets of seals is pressurized by nitrogen. This design generates 'zero-differential pressure' on the inner seal (which is in contact with the gouge powder), and prevents gouge leaks. For the preliminary dry experiments, we used ~2.0 mm thick layers of room-dry kaolinite powder. Total displacements were on the order of meters and normal stress up to 4 MPa. The initial shear was accommodated by multiple internal slip surfaces within the kaolinite layer accommodated as oriented Riedel shear structures. Later, the shear was localized within a thin, plate-parallel Y-surface. The kaolinite layer was compacted at a quasi-asymptotic rate, and displayed a steady-state friction coefficient of ~ 0.5 with no clear dependence on slip velocity up to 0.15 m/s. Further experiments with loose quartz sand (grain size ~ 125 micron) included both dry runs and pore-pressure (distilled water) controlled runs. The sand was pressurized through a porous metal (Mott) plug. Comparison with effective stress calculations indicates the same friction coefficient of ~ 1.0 for the sand layer under dry and pressurized conditions. Both kaolinite and quartz sand experiments developed localized shear zones that were examined at the nano- and micro- scales with AFM, SEM and TEM. These zones displayed reduced grain sizes and cementation by local agglomeration. Kaolinite grains sheared in CROC experiment; scale bar = 1 micron.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Periodic dielectric structure for production of photonic band gap and devices incorporating the same

DOEpatents

Ho, Kai-Ming; Chan, Che-Ting; Soukoulis, Costas

1994-08-02

A periodic dielectric structure which is capable of producing a photonic band gap and which is capable of practical construction. The periodic structure is formed of a plurality of layers, each layer being formed of a plurality of rods separated by a given spacing. The material of the rods contrasts with the material between the rods to have a refractive index contrast of at least two. The rods in each layer are arranged with their axes parallel and at a given spacing. Adjacent layers are rotated by 90.degree., such that the axes of the rods in any given layer are perpendicular to the axes in its neighbor. Alternating layers (that is, successive layers of rods having their axes parallel such as the first and third layers) are offset such that the rods of one are about at the midpoint between the rods of the other. A four-layer periocity is thus produced, and successive layers are stacked to form a three-dimensional structure which exhibits a photonic band gap. By virtue of forming the device in layers of elongate members, it is found that the device is susceptible of practical construction.
Evolution of Kelvin-Helmholtz instability at Venus in the presence of the parallel magnetic field

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, H. Y.; Key Laboratory of Planetary Sciences, Chinese Academy of Sciences, Nanjing 210008; Cao, J. B.

2015-06-15

Two-dimensional MHD simulations were performed to study the evolution of the Kelvin-Helmholtz (KH) instability at the Venusian ionopause in response to the strong flow shear in presence of the in-plane magnetic field parallel to the flow direction. The physical behavior of the KH instability as well as the triggering and occurrence conditions for highly rolled-up vortices are characterized through several physical parameters, including Alfvén Mach number on the upper side of the layer, the density ratio, and the ratio of parallel magnetic fields between two sides of the layer. Using these parameters, the simulations show that both the high densitymore » ratio and the parallel magnetic field component across the boundary layer play a role of stabilizing the instability. In the high density ratio case, the amount of total magnetic energy in the final quasi-steady status is much more than that in the initial status, which is clearly different from the case with low density ratio. We particularly investigate the nonlinear development of the case that has a high density ratio and uniform magnetic field. Before the instability saturation, a single magnetic island is formed and evolves into two quasi-steady islands in the non-linear phase. A quasi-steady pattern eventually forms and is embedded within a uniform magnetic field and a broadened boundary layer. The estimation of loss rates of ions from Venus indicates that the stabilizing effect of the parallel magnetic field component on the KH instability becomes strong in the case of high density ratio.« less
High-pressure synthesis and crystal structures of the strontium oxogallates Sr2Ga2O5 and Sr5Ga6O14

NASA Astrophysics Data System (ADS)

Kahlenberg, Volker; Goettgens, Valerie; Mair, Philipp; Schmidmair, Daniela

2015-08-01

High-pressure synthesis experiments in a piston-cylinder apparatus at 1.5 GPa/3.0 GPa and 1000 °C resulted in the formation of single-crystals of Sr2Ga2O5 and Sr5Ga6O14, respectively. The structures of both compounds have been solved from single-crystal diffraction data sets using direct methods. The first compound is orthorhombic with space group type Pbca (a=10.0021(4) Å, b=9.601(4) Å, c=10.6700(4) Å, V=1024.6(4) Å3, Mr=394.68 u, Z=8, Dx=5.12 g/cm3) and belongs to the group of single layer gallates. Individual sheets are parallel to (0 0 1) and can be built from the condensation of unbranched vierer single chains running along [0 1 0]. The layers are characterized by the presence of four- and strongly elliptical eight-membered rings of corner connected tetrahedra in UUDD and UUUUDDDD conformation. Strontium atoms are sandwiched between the tetrahedral layers for charge compensation and are coordinated by six and seven oxygen ligands, respectively. Sr2Ga2O5 is isotypic with several other double sulfides and selenides. To the best of our knowledge, it is the first example of an oxide with this structure type. From a structural point of view, Sr5Ga6O14 is a phyllogallate as well. The crystal structure adopts the monoclinic space group P21/c (a=8.1426(3) Å, b=8.1803(3) Å, c=10.8755(4) Å, β=91.970(4)° V=723.98(5) Å3, Mr=1080.42 u, Z=2, Dx=4.96 g/cm3). Individual sheets extend along (0 0 1). Basic building units are unbranched dreier single chains parallel to [1 0 0]. The layers contain tertiary (Q3) und quaternary (Q4) connected [GaO4]-tetrahedra in the ratio 2:1 resulting in a Ga:O ratio of 3:7 and the formation of exclusively five-membered rings. Linkage between adjacent tetrahedral sheets is provided by three symmetrically independent strontium ions which are surrounded by six to eight oxygen atoms. The layers in Sr5Ga6O14 are similar to those observed in the melilite structure-type. Crystallochemical relationships between the present phases and other known compounds are discussed in detail.
Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacob, A; Gokhale, M

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
Rapid planar chromatographic analysis of 25 water-soluble dyes used as food additives.

PubMed

Morlock, Gertrud E; Oellig, Claudia

2009-01-01

A rapid planar chromatographic method for identification and quantification of 25 water-soluble dyes in food was developed. In a horizontal developing chamber, the chromatographic separation on silica gel 60F254 high-performance thin-layer chromatography plates took 12 min for 40 runs in parallel, using 8 mL ethyl acetate-methanol-water-acetic acid (65 + 23 + 11 + 1, v/v/v/v) mobile phase up to a migration distance of 50 mm. However, the total analysis time, inclusive of application and evaluation, took 60 min for 40 runs. Thus, the overall time/run can be calculated as 1.5 min with a solvent consumption of 200 microL. A sample throughput of 1000 runs/8 h day can be reached by switching between the working stations (application, development, and evaluation) in a 20 min interval, which triples the analysis throughput. Densitometry was performed by absorption measurement using the multiwavelength scan mode in the UV and visible ranges. Repeatabilities [relative standard deviation (RSD), 4 determinations] at the first or second calibration level showed precisions of mostly < or = 2.7%, ranging between 0.2 and 5.2%. Correlation coefficient values (R > or = 0.9987) and RSD values (< or = 4.2%) of the calibration curves were highly satisfactory using classical quantification. However, digital evaluation of the plate image was also used for quantification, which resulted in RSD values of the calibration curves of mostly < or = 3.0%, except for two < or = 6.0%. The method was applied for the analysis of some energy drinks and bakery ink formulations, directly applied after dilution. By recording of absorbance spectra in the visible range, the identities of the dyes found in the samples were ascertained by comparison with the respective standard bands (correlation coefficients > or = 0.9996). If necessary for confirmation, online mass spectra were recorded within a minute.
Surrogate Reservoir Model

NASA Astrophysics Data System (ADS)

Mohaghegh, Shahab

2010-05-01

Surrogate Reservoir Model (SRM) is new solution for fast track, comprehensive reservoir analysis (solving both direct and inverse problems) using existing reservoir simulation models. SRM is defined as a replica of the full field reservoir simulation model that runs and provides accurate results in real-time (one simulation run takes only a fraction of a second). SRM mimics the capabilities of a full field model with high accuracy. Reservoir simulation is the industry standard for reservoir management. It is used in all phases of field development in the oil and gas industry. The routine of simulation studies calls for integration of static and dynamic measurements into the reservoir model. Full field reservoir simulation models have become the major source of information for analysis, prediction and decision making. Large prolific fields usually go through several versions (updates) of their model. Each new version usually is a major improvement over the previous version. The updated model includes the latest available information incorporated along with adjustments that usually are the result of single-well or multi-well history matching. As the number of reservoir layers (thickness of the formations) increases, the number of cells representing the model approaches several millions. As the reservoir models grow in size, so does the time that is required for each run. Schemes such as grid computing and parallel processing helps to a certain degree but do not provide the required speed for tasks such as: field development strategies using comprehensive reservoir analysis, solving the inverse problem for injection/production optimization, quantifying uncertainties associated with the geological model and real-time optimization and decision making. These types of analyses require hundreds or thousands of runs. Furthermore, with the new push for smart fields in the oil/gas industry that is a natural growth of smart completion and smart wells, the need for real time reservoir modeling becomes more pronounced. SRM is developed using the state of the art in neural computing and fuzzy pattern recognition to address the ever growing need in the oil and gas industry to perform accurate, but high speed simulation and modeling. Unlike conventional geo-statistical approaches (response surfaces, proxy models …) that require hundreds of simulation runs for development, SRM is developed only with a few (from 10 to 30 runs) simulation runs. SRM can be developed regularly (as new versions of the full field model become available) off-line and can be put online for real-time processing to guide important decisions. SRM has proven its value in the field. An SRM was developed for a giant oil field in the Middle East. The model included about one million grid blocks with more than 165 horizontal wells and took ten hours for a single run on 12 parallel CPUs. Using only 10 simulation runs, an SRM was developed that was able to accurately mimic the behavior of the reservoir simulation model. Performing a comprehensive reservoir analysis that included making millions of SRM runs, wells in the field were divided into five clusters. It was predicted that wells in cluster one & two are best candidates for rate relaxation with minimal, long term water production while wells in clusters four and five are susceptive to high water cuts. Two and a half years and 20 wells later, rate relaxation results from the field proved that all the predictions made by the SRM analysis were correct. While incremental oil production increased in all wells (wells in clusters 1 produced the most followed by wells in cluster 2, 3 …) the percent change in average monthly water cut for wells in each cluster clearly demonstrated the analytic power of SRM. As it was correctly predicted, wells in clusters 1 and 2 actually experience a reduction in water cut while a substantial increase in water cut was observed in wells classified into clusters 4 and 5. Performing these analyses would have been impossible using the original full field simulation model.
A Parallel Saturation Algorithm on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

Sierra Structural Dynamics User's Notes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reese, Garth M.

2015-10-19

Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Sierra/SD User's Notes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Munday, Lynn Brendon; Day, David M.; Bunting, Gregory

Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

DTIC Science & Technology

2015-11-20

1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Advanced Numerical Techniques of Performance Evaluation. Volume 1

DTIC Science & Technology

1990-06-01

system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp
StrAuto: automation and parallelization of STRUCTURE analysis.

PubMed

Chhatre, Vikram E; Emerson, Kevin J

2017-03-24

Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .
Electron pressure balance in the SOL through the transition to detachment

DOE PAGES

McLean, A. G.; Leonard, A. W.; Makowski, M. A.; ...

2015-02-07

Upgrades to core and divertor Thomson scattering (DTS) diagnostics at DIII-D have provided measurements of electron pressure profiles in the lower divertor from attached- to fully-detached divertor plasma conditions. Detailed, multistep sequences of discharges with increasing line-averaged density were run at several levels of P inj. Strike point sweeping allowed 2D divertor characterization using DTS optimized to measure T e down to 0.5 eV. The ionization front at the onset of detachment is found to move upwards in a controlled manner consistent with the indication that scrape-off layer parallel power flux is converted from conducted to convective heat transport. Measurementsmore » of n e, T e and p e in the divertor versus Lparallel demonstrate a rapid transition from Te ≥ 15 eV to ≤3 eV occurring both at the outer strike point and upstream of the X-point. Furthermore, these observations provide a strong benchmark for ongoing modeling of divertor detachment for existing and future tokamak devices.« less
Electron pressure balance in the SOL through the transition to detachment

NASA Astrophysics Data System (ADS)

McLean, A. G.; Leonard, A. W.; Makowski, M. A.; Groth, M.; Allen, S. L.; Boedo, J. A.; Bray, B. D.; Briesemeister, A. R.; Carlstrom, T. N.; Eldon, D.; Fenstermacher, M. E.; Hill, D. N.; Lasnier, C. J.; Liu, C.; Osborne, T. H.; Petrie, T. W.; Soukhanovskii, V. A.; Stangeby, P. C.; Tsui, C.; Unterberg, E. A.; Watkins, J. G.

2015-08-01

Upgrades to core and divertor Thomson scattering (DTS) diagnostics at DIII-D have provided measurements of electron pressure profiles in the lower divertor from attached- to fully-detached divertor plasma conditions. Detailed, multistep sequences of discharges with increasing line-averaged density were run at several levels of Pinj. Strike point sweeping allowed 2D divertor characterization using DTS optimized to measure Te down to 0.5 eV. The ionization front at the onset of detachment is found to move upwards in a controlled manner consistent with the indication that scrape-off layer parallel power flux is converted from conducted to convective heat transport. Measurements of ne, Te and pe in the divertor versus Lparallel demonstrate a rapid transition from Te ⩾ 15 eV to ⩽3 eV occurring both at the outer strike point and upstream of the X-point. These observations provide a strong benchmark for ongoing modeling of divertor detachment for existing and future tokamak devices.
Near-surface coherent structures explored by large eddy simulation of entire tropical cyclones.

PubMed

Ito, Junshi; Oizumi, Tsutao; Niino, Hiroshi

2017-06-19

Taking advantage of the huge computational power of a massive parallel supercomputer (K-supercomputer), this study conducts large eddy simulations of entire tropical cyclones by employing a numerical weather prediction model, and explores near-surface coherent structures. The maximum of the near-surface wind changes little from that simulated based on coarse-resolution runs. Three kinds of coherent structures appeared inside the boundary layer. The first is a Type-A roll, which is caused by an inflection-point instability of the radial flow and prevails outside the radius of maximum wind. The second is a Type-B roll that also appears to be caused by an inflection-point instability but of both radial and tangential winds. Its roll axis is almost orthogonal to the Type-A roll. The third is a Type-C roll, which occurs inside the radius of maximum wind and only near the surface. It transports horizontal momentum in an up-gradient sense and causes the largest gusts.
ARC-1989-AC89-7046

NASA Image and Video Library

1989-08-25

P-34764 Voyager 2 obtained this high resolution color image of Neptune's large satellite Triton during its close flyby. Approximately a dozen individual images were combined to produce this comprehensive view of the Neptune-facing hemisphere of Triton. Fine detail is provided by high resolution, clear-filter images, with color information added from lower resolution frames. The large south polar cap at the bottom of the image is highly refective and slightly pink in color , and may consist of a slowly evaporating layer of nitrogen ice deposited during the previous winter. From the ragged edge of the polar cap northward the satellite's face is generously darker and redder in color. This coloring may be produced by the action of ultraviolet light and magnetospheric radiation upon methane in the atmosphere and surface. Running across this darker region , approximately parallel to the edge of the polar cap, is a band of brighter white material that is almost bluish in color. The underlying topography in this bright band is similiar, however to that in the darker, redder regions surrounding it.
GEOS Atmospheric Model: Challenges at Exascale

NASA Technical Reports Server (NTRS)

Putman, William M.; Suarez, Max J.

2017-01-01

The Goddard Earth Observing System (GEOS) model at NASA's Global Modeling and Assimilation Office (GMAO) is used to simulate the multi-scale variability of the Earth's weather and climate, and is used primarily to assimilate conventional and satellite-based observations for weather forecasting and reanalysis. In addition, assimilations coupled to an ocean model are used for longer-term forecasting (e.g., El Nino) on seasonal to interannual times-scales. The GMAO's research activities, including system development, focus on numerous time and space scales, as detailed on the GMAO website, where they are tabbed under five major themes: Weather Analysis and Prediction; Seasonal-Decadal Analysis and Prediction; Reanalysis; Global Mesoscale Modeling, and Observing System Science. A brief description of the GEOS systems can also be found at the GMAO website. GEOS executes as a collection of earth system components connected through the Earth System Modeling Framework (ESMF). The ESMF layer is supplemented with the MAPL (Modeling, Analysis, and Prediction Layer) software toolkit developed at the GMAO, which facilitates the organization of the computational components into a hierarchical architecture. GEOS systems run in parallel using a horizontal decomposition of the Earth's sphere into processing elements (PEs). Communication between PEs is primarily through a message passing framework, using the message passing interface (MPI), and through explicit use of node-level shared memory access via the SHMEM (Symmetric Hierarchical Memory access) protocol. Production GEOS weather prediction systems currently run at 12.5-kilometer horizontal resolution with 72 vertical levels decomposed into PEs associated with 5,400 MPI processes. Research GEOS systems run at resolutions as fine as 1.5 kilometers globally using as many as 30,000 MPI processes. Looking forward, these systems can be expected to see a 2 times increase in horizontal resolution every two to three years, as well as less frequent increases in vertical resolution. Coupling these resolution changes with increases in complexity, the computational demands on the GEOS production and research systems should easily increase 100-fold over the next five years. Currently, our 12.5 kilometer weather prediction system narrowly meets the time-to-solution demands of a near-real-time production system. Work is now in progress to take advantage of a hybrid MPI-OpenMP parallelism strategy, in an attempt to achieve a modest two-fold speed-up to accommodate an immediate demand due to increased scientific complexity and an increase in vertical resolution. Pursuing demands that require a 10- to 100-fold increases or more, however, would require a detailed exploration of the computational profile of GEOS, as well as targeted solutions using more advanced high-performance computing technologies. Increased computing demands of 100-fold will be required within five years based on anticipated changes in the GEOS production systems, increases of 1000-fold can be anticipated over the next ten years.
A Connectionist Simulation of Attention and Vector Comparison: The Need for Serial Processing in Parallel Hardware

DTIC Science & Technology

1991-01-01

visual and three-layer connectionist network, in that the input layer of memory processing is serial, and is likely to represent each module is... Selective attention gates visual University Press. processing in the extrastnate cortex. Science, 229:782-784. Treasman, A.M. (1985). Preartentive...AD-A242 225 A CONNECTIONIST SIMULATION OF ATTENTION AND VECTOR COMPARISON: THE NEED FOR SERIAL PROCESSING IN PARALLEL HARDWARE Technical Report AlP
Large-scale trench-perpendicular mantle flow beneath northern Chile

NASA Astrophysics Data System (ADS)

Reiss, M. C.; Rumpker, G.; Woelbern, I.

2017-12-01

We investigate the anisotropic properties of the forearc region of the central Andean margin by analyzing shear-wave splitting from teleseismic and local earthquakes from the Nazca slab. The data stems from the Integrated Plate boundary Observatory Chile (IPOC) located in northern Chile, covering an approximately 120 km wide coastal strip between 17°-25° S with an average station spacing of 60 km. With partly over ten years of data, this data set is uniquely suited to address the long-standing debate about the mantle flow field at the South American margin and in particular whether the flow field beneath the slab is parallel or perpendicular to the trench. Our measurements yield two distinct anisotropic layers. The teleseismic measurements show a change of fast polarizations directions from North to South along the trench ranging from parallel to subparallel to the absolute plate motion and, given the geometry of absolute plate motion and strike of the trench, mostly perpendicular to the trench. Shear-wave splitting from local earthquakes shows fast polarizations roughly aligned trench-parallel but exhibit short-scale variations which are indicative of a relatively shallow source. Comparisons between fast polarization directions and the strike of the local fault systems yield a good agreement. We use forward modelling to test the influence of the upper layer on the teleseismic measurements. We show that the observed variations of teleseismic measurements along the trench are caused by the anisotropy in the upper layer. Accordingly, the mantle layer is best characterized by an anisotropic fast axes parallel to the absolute plate motion which is roughly trench-perpendicular. This anisotropy is likely caused by a combination of crystallographic preferred orientation of the mantle mineral olivine as fossilized anisotropy in the slab and entrained flow beneath the slab. We interpret the upper anisotropic layer to be confined to the crust of the overriding continental plate. This is explained by the shape-preferred orientation of micro-cracks in relation to local fault zones which are oriented parallel the overall strike of the Andean range. Our results do not provide any evidence for a significant contribution of trench-parallel mantle flow beneath the subducting slab to the measurements.
A Concept for Run-Time Support of the Chapel Language

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

A document presents a concept for run-time implementation of other concepts embodied in the Chapel programming language. (Now undergoing development, Chapel is intended to become a standard language for parallel computing that would surpass older such languages in both computational performance in the efficiency with which pre-existing code can be reused and new code written.) The aforementioned other concepts are those of distributions, domains, allocations, and access, as defined in a separate document called "A Semantic Framework for Domains and Distributions in Chapel" and linked to a language specification defined in another separate document called "Chapel Specification 0.3." The concept presented in the instant report is recognition that a data domain that was invented for Chapel offers a novel approach to distributing and processing data in a massively parallel environment. The concept is offered as a starting point for development of working descriptions of functions and data structures that would be necessary to implement interfaces to a compiler for transforming the aforementioned other concepts from their representations in Chapel source code to their run-time implementations.
User's and test case manual for FEMATS

NASA Technical Reports Server (NTRS)

Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John

1995-01-01

The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
Message Passing on GPUs

NASA Astrophysics Data System (ADS)

Stuart, J. A.

2011-12-01

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors, and more specifically GPUs. As a case study, we design and implement the ``DCGN'' API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.
Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

NASA Astrophysics Data System (ADS)

Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

2018-01-01

The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
Partitioning in parallel processing of production systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oflazer, K.

1987-01-01

This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Parallelization of the FLAPW method

NASA Astrophysics Data System (ADS)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

Parallel evolution of image processing tools for multispectral imagery

NASA Astrophysics Data System (ADS)

Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.

2000-11-01

We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, R.E.; Gustafson, J.L.; Montry, G.R.

1999-08-10

A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1999-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1989-01-01

The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
Magnetostatic effects on switching in small magnetic tunnel junctions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bapna, Mukund; Piotrowski, Stephan K.; Oberdick, Samuel D.

Perpendicular CoFeB/MgO/CoFeB magnetic tunnel junctions with diameters under 100 nm are investigated by conductive atomic force microscopy. Minor loops of the tunnel magnetoresistance as a function of applied magnetic field reveal the hysteresis of the soft layer and an offset due to the magnetostatic field of the hard layer. Within the hysteretic region, telegraph noise is observed in the tunnel current. Simulations show that in this range, the net magnetic field in the soft layer is spatially inhomogeneous, and that antiparallel to parallel switching tends to start near the edge, while parallel to antiparallel reversal favors nucleation in the interior ofmore » the soft layer. As the diameter of the tunnel junction is decreased, the average magnitude of the magnetostatic field increases, but the spatial inhomogeneity across the soft layer is reduced.« less
The Geomorphological Evolution of a Landscape in a Tectonically Active Region: the Sennwald Landslide

NASA Astrophysics Data System (ADS)

Aksay, Selçuk; Ivy-Ochs, Susan; Hippe, Kristina; Graemiger, Lorenz; Vockenhuber, Christof

2016-04-01

The Säntis nappe is a fold-and-thrust structure in eastern Switzerland consisting of numerous tectonic discontinuities that make rocks vulnerable to rock failure. The Sennwald landslide is one of those events that occurred due to the failure of Lower Cretaceous Helvetic limestones. This study reveals the surface exposure age of the event in relation to geological and tectonic setting, earthquake frequency of the Central Alps, and regional scale climate/weather influence. Our study comprises detailed mapping of landform features, thin section analysis of landslide boulder lithologies, landslide volume estimation, numerical DAN-3D run-out modelling, and the spatial and temporal relationship of the event. In the Sennwald landslide, 92 million m3 of limestones detached from the south-eastern wall of the Säntis nappe and slid with a maximum travel distance of ~4'500 m and a "fahrboeschung" angle of 15° along the SE-dipping sliding plane almost parallel to the orientation of the bedding plane. Numerical run-out modelling results match the extent and the thickness of landslide deposits as observed in the field. The original bedrock stratigraphy was preserved as geologically the top layer in the bedrock package travelled the farthest and the bottom layer came to rest closest to the release bedrock wall during the landslide. Velocities of maximum 90 m/s were obtained from the numerical run-out modelling. Total Cl and 36Cl were determined at ETH AMS facility with isotope dilution methods defined in the literature (Ivy-Ochs et al., 2004). Surface exposure ages of landslide deposits in the accumulation area are revealed from twelve boulders. The distribution of limestone boulders in the accumulation area, the exposure ages, and the numerical run-out modelling support the hypothesis that the Sennwald landslide was a single catastrophic event. The event is likely to have been triggered by at least light to moderate earthquakes (Mw=4.0-6.0). The historical and the last 40-year earthquake activity shows that this region is tectonically still active (Mosar, 1999) with numerous earthquakes. The exposure ages imply that the rock failure occurred during the middle Holocene, a period of increased neotectonic activity in Eastern Alps suggested by Prager et al. (2007). This time period also coincides with notably wet climate, which has been suggested as an important trigger for landslides around this age across the Alps (Zerathe et al., 2014).
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

PubMed Central

Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

2014-01-01

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Some observations on rutherfordine

USGS Publications Warehouse

Clark, Joan R.; Christ, C.L.

1956-01-01

The optical properties of rutherfordine, UO2CO3, previously determined on microscopic crystals, have been redetermined on considerably larger crystals; and the relations among the indices of refraction, the morphology, and the crystal structure have been examined. Rutherfordine is orthorhombic, biaxial positive, with α = 1.715, β = 1.730, γ = 1.795, 2V = 53° (calc.); X = b, Y = c (elongation, Z = a. The crystal structure of UO2CO 3 consists of layers of carbonate groups parallel to (010) with linear (O-U-O) ions normal to the layers. The indices β and γ correspond to vibration directions parallel to layers; the unexpectedly large difference in value between β and γ is ascribed to the optical anisotropy of the uranium-oxygen bonding in the layer. Indexed X-ray powder data are given.
Modeling and Characterization of Capacitive Elements With Tissue as Dielectric Material for Wireless Powering of Neural Implants.

PubMed

Erfani, Reza; Marefat, Fatemeh; Sodagar, Amir M; Mohseni, Pedram

2018-05-01

This paper reports on the modeling and characterization of capacitive elements with tissue as the dielectric material, representing the core building block of a capacitive link for wireless power transfer to neural implants. Each capacitive element consists of two parallel plates that are aligned around the tissue layer and incorporate a grounded, guarded, capacitive pad to mitigate the adverse effect of stray capacitances and shield the plates from external interfering electric fields. The plates are also coated with a biocompatible, insulating, coating layer on the inner side of each plate in contact with the tissue. A comprehensive circuit model is presented that accounts for the effect of the coating layers and is validated by measurements of the equivalent capacitance as well as impedance magnitude/phase of the parallel plates over a wide frequency range of 1 kHz-10 MHz. Using insulating coating layers of Parylene-C at a thickness of and Parylene-N at a thickness of deposited on two sets of parallel plates with different sizes and shapes of the guarded pad, our modeling and characterization results accurately capture the effect of the thickness and electrical properties of the coating layers on the behavior of the capacitive elements over frequency and with different tissues.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

PubMed

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Approaches in highly parameterized inversion - GENIE, a general model-independent TCP/IP run manager

USGS Publications Warehouse

Muffels, Christopher T.; Schreuder, Willem A.; Doherty, John E.; Karanovic, Marinko; Tonkin, Matthew J.; Hunt, Randall J.; Welter, David E.

2012-01-01

GENIE is a model-independent suite of programs that can be used to generally distribute, manage, and execute multiple model runs via the TCP/IP infrastructure. The suite consists of a file distribution interface, a run manage, a run executer, and a routine that can be compiled as part of a program and used to exchange model runs with the run manager. Because communication is via a standard protocol (TCP/IP), any computer connected to the Internet can serve in any of the capacities offered by this suite. Model independence is consistent with the existing template and instruction file protocols of the widely used PEST parameter estimation program. This report describes (1) the problem addressed; (2) the approach used by GENIE to queue, distribute, and retrieve model runs; and (3) user instructions, classes, and functions developed. It also includes (4) an example to illustrate the linking of GENIE with Parallel PEST using the interface routine.
[A design of simple ventilator control system based on LabVIEW].

PubMed

Pei, Baoqing; Xu, Shengwei; Li, Hui; Li, Deyu; Pei, Yidong; He, Haixing

2011-01-01

This paper designed a ventilator control system to control proportional valves and motors. It used LabVIEW to control the object mentioned above and design ,validate, evaluate arithmetic, and establish hardware in loop platform. There are two system' s hierarchies. The high layer was used to run non-real time program and the low layer was used to run real time program. The two layers communicated through TCP/IP net. The program can be divided into several modules, which can be expanded and maintained easily. And the harvest in the prototype designing can be seamlessly used to embedded products. From all above, this system was useful in employing OEM products.
Distributed intelligence for supervisory control

NASA Technical Reports Server (NTRS)

Wolfe, W. J.; Raney, S. D.

1987-01-01

Supervisory control systems must deal with various types of intelligence distributed throughout the layers of control. Typical layers are real-time servo control, off-line planning and reasoning subsystems and finally, the human operator. Design methodologies must account for the fact that the majority of the intelligence will reside with the human operator. Hierarchical decompositions and feedback loops as conceptual building blocks that provide a common ground for man-machine interaction are discussed. Examples of types of parallelism and parallel implementation on several classes of computer architecture are also discussed.
Parallel heater system for subsurface formations

DOEpatents

Harris, Christopher Kelvin [Houston, TX; Karanikas, John Michael [Houston, TX; Nguyen, Scott Vinh [Houston, TX

2011-10-25

A heating system for a subsurface formation is disclosed. The system includes a plurality of substantially horizontally oriented or inclined heater sections located in a hydrocarbon containing layer in the formation. At least a portion of two of the heater sections are substantially parallel to each other. The ends of at least two of the heater sections in the layer are electrically coupled to a substantially horizontal, or inclined, electrical conductor oriented substantially perpendicular to the ends of the at least two heater sections.
Local search to improve coordinate-based task mapping

DOE PAGES

Balzuweit, Evan; Bunde, David P.; Leung, Vitus J.; ...

2015-10-31

We present a local search strategy to improve the coordinate-based mapping of a parallel job’s tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job’s communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Utilizing the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtimemore » variability. Furthermore, we further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.« less
The instant sequencing task: Toward constraint-checking a complex spacecraft command sequence interactively

NASA Technical Reports Server (NTRS)

Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.

1993-01-01

Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.
Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.

2013-10-15

We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositionsmore » will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.« less
Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Terry R

2012-01-01

This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Parallel deterministic neutronics with AMR in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clouse, C.; Ferguson, J.; Hendrickson, C.

1997-12-31

AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.

A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

DTIC Science & Technology

2012-02-17

to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Computer-Aided Parallelizer and Optimizer

NASA Technical Reports Server (NTRS)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Simulation of ground-water flow to assess geohydrologic factors and their effect on source-water areas for bedrock wells in Connecticut

USGS Publications Warehouse

Starn, J. Jeffrey; Stone, Janet Radway

2005-01-01

Generic ground-water-flow simulation models show that geohydrologic factors?fracture types, fracture geometry, and surficial materials?affect the size, shape, and location of source-water areas for bedrock wells. In this study, conducted by the U.S. Geological Survey in cooperation with the Connecticut Department of Public Health, ground-water flow was simulated to bedrock wells in three settings?on hilltops and hillsides with no surficial aquifer, in a narrow valley with a surficial aquifer, and in a broad valley with a surficial aquifer?to show how different combinations of geohydrologic factors in different topographic settings affect the dimensions and locations of source-water areas in Connecticut. Three principal types of fractures are present in bedrock in Connecticut?(1) Layer-parallel fractures, which developed as partings along bedding in sedimentary rock and compositional layering or foliation in metamorphic rock (dips of these fractures can be gentle or steep); (2) unroofing joints, which developed as strain-release fractures parallel to the land surface as overlying rock was removed by erosion through geologic time; and (3) cross fractures and joints, which developed as a result of tectonically generated stresses that produced typically near-vertical or steeply dipping fractures. Fracture geometry is defined primarily by the presence or absence of layering in the rock unit, and, if layered, by the angle of dip in the layering. Where layered rocks dip steeply, layer-parallel fracturing generally is dominant; unroofing joints also are typically well developed. Where layered rocks dip gently, layer-parallel fracturing also is dominant, and connections among these fractures are provided only by the cross fractures. In gently dipping rocks, unroofing joints generally do not form as a separate fracture set; instead, strain release from unroofing has occurred along gently dipping layer-parallel fractures, enhancing their aperture. In nonlayered and variably layered rocks, layer-parallel fracturing is absent or poorly developed; fracturing is dominated by well-developed subhorizontal unroofing joints and steeply dipping, tectonically generated fractures and (or) cooling joints. Cross fractures (or cooling joints) in nonlayered and variably layered rocks have more random orientations than in layered rocks. Overall, nonlayered or variably layered rocks do not have a strongly developed fracture direction. Generic ground-water-flow simulation models showed that fracture geometry and other geohydrologic factors affect the dimensions and locations of source-water areas for bedrock wells. In general, source-water areas to wells reflect the direction of ground-water flow, which mimics the land-surface topography. Source-water areas to wells in a hilltop setting were not affected greatly by simulated fracture zones, except for an extensive vertical fracture zone. Source-water areas to wells in a hillside setting were not affected greatly by simulated fracture zones, except for the combination of a subhorizontal fracture zone and low bedrock vertical hydraulic conductivity, as might be the case where an extensive subhorizontal fracture zone is not connected or is poorly connected to the surface through vertical fractures. Source-water areas to wells in a narrow valley setting reflect complex ground-water-flow paths. The typical flow path originates in the uplands and passes through either till or bedrock into the surficial aquifer, although only a small area of the surficial aquifer actually contributes water to the well. Source-water areas in uplands can include substantial areas on both sides of a river. Source-water areas for wells in this setting are affected mainly by the rate of ground-water recharge and by the degree of anisotropy. Source-water areas to wells in a broad valley setting (bedrock with a low angle of dip) are affected greatly by fracture properties. The effect of a given fracture is to channel the
Parallel programming of industrial applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heroux, M; Koniges, A; Simon, H

1998-07-21

In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from thesemore » applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).« less
Simplified Parallel Domain Traversal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erickson III, David J

2011-01-01

Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Parallel consistent labeling algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samal, A.; Henderson, T.

Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms. Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms. In this paper, they give parallel algorithms for solving node and arc consistency. They show that any parallel algorithm for enforcing arc consistency in the worst case must have O(na) sequential steps, where n is number of nodes, and a is the number of labels per node.more » They give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.« less
Early Cretaceous Ductile Deformation of Marbles from the Western Hills of Beijing, North China Craton

NASA Astrophysics Data System (ADS)

Feng, H.; Liu, J.

2017-12-01

During the Early Cretaceous tectonic lithosphere extension, the pre-mesozoic rocks from the Western Hills in the central part of the North China Craton suffered from weak metamorphism but intense shear deformation. The prominent features of the deformation structures are the coexisting layer-parallel shear zones and intrafolia folds, and the along-strike thickness variations of the marble layers from the highly sheared Mesoproterozoic Jing'eryu Formation. Platy marbles are well-developed in the thinner layers, while intrafolia folds are often observed in the thicker layers. Most folds are tight recumbent folds and their axial planes are parallel to the foliations and layerings of the marbles. The folds are A-type folds with hinges being always paralleling to the stretching lineations consistently oriented at 130°-310° directions throughout the entire area. SPO and microstructural analyses of the sheared marbles suggest that the thicker layers suffered from deformations homogeneously, while strain localization can be distinguished in the thinner layers. Calcite twin morphology and CPO analysis indicate that the deformation of marbles from both thinner and thicker layers happened at temperatures of 300 to 500°C. The above analysis suggests that marbles in the thicker layers experienced a progressive sequence of thermodynamic events: 1) regional metamorphism, 2) early ductile deformation dominated by relatively higher temperature conditions, during which all the mineral particles elongated and oriented limitedly and the calcite grains are deformed mainly by mechanical twinning, and 3) late superimposition of relatively lower temperature deformation and recrystallization, which superposed the early deformation, and made the calcites finely granulated, elongated and oriented by dynamical recrystallization along with other grains. Marbles from the thinner layers, however, experienced a similar, but different sequence of thermo-dynamic events, i.e. regional metamorphism, early ductile deformation and weak superimposition by subsequent deformation, which caused the development of the strain localization. It is also shown that the intensity of progressive superimposition deformation contributed to the thinning and thickening of the marble layers.
Suppressing correlations in massively parallel simulations of lattice models

NASA Astrophysics Data System (ADS)

Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle

2017-11-01

For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Removal of suspended solids and turbidity from marble processing wastewaters by electrocoagulation: comparison of electrode materials and electrode connection systems.

PubMed

Solak, Murat; Kiliç, Mehmet; Hüseyin, Yazici; Sencan, Aziz

2009-12-15

In this study, removal of suspended solids (SS) and turbidity from marble processing wastewaters by electrocoagulation (EC) process were investigated by using aluminium (Al) and iron (Fe) electrodes which were run in serial and parallel connection systems. To remove these pollutants from the marble processing wastewater, an EC reactor including monopolar electrodes (Al/Fe) in parallel and serial connection system, was utilized. Optimization of differential operation parameters such as pH, current density, and electrolysis time on SS and turbidity removal were determined in this way. EC process with monopolar Al electrodes in parallel and serial connections carried out at the optimum conditions where the pH value was 9, current density was approximately 15 A/m(2), and electrolysis time was 2 min resulted in 100% SS removal. Removal efficiencies of EC process for SS with monopolar Fe electrodes in parallel and serial connection were found to be 99.86% and 99.94%, respectively. Optimum parameters for monopolar Fe electrodes in both of the connection types were found to be for pH value as 8, for electrolysis time as 2 min. The optimum current density value for Fe electrodes used in serial and parallel connections was also obtained at 10 and 20 A/m(2), respectively. Based on the results obtained, it was found that EC process running with each type of the electrodes and the connections was highly effective for the removal of SS and turbidity from marble processing wastewaters, and that operating costs with monopolar Al electrodes in parallel connection were the cheapest than that of the serial connection and all the configurations for Fe electrode.
Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

PubMed

Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

2017-10-06

The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager

DOEpatents

Kwiatkowski, Kris; Lyke, James

2007-12-18

Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
Interlayer tunneling in double-layer quantum hall pseudoferromagnets.

PubMed

Balents, L; Radzihovsky, L

2001-02-26

We show that the interlayer tunneling I-V in double-layer quantum Hall states displays a rich behavior which depends on the relative magnitude of sample size, voltage length scale, current screening, disorder, and thermal lengths. For weak tunneling, we predict a negative differential conductance of a power-law shape crossing over to a sharp zero-bias peak. An in-plane magnetic field splits this zero-bias peak, leading instead to a "derivative" feature at V(B)(B(parallel)) = 2 pi Planck's over 2 pi upsilon B(parallel)d/e phi(0), which gives a direct measurement of the dispersion of the Goldstone mode corresponding to the spontaneous symmetry breaking of the double-layer Hall state.
A powered prosthetic ankle joint for walking and running.

PubMed

Grimmer, Martin; Holgate, Matthew; Holgate, Robert; Boehler, Alexander; Ward, Jeffrey; Hollander, Kevin; Sugar, Thomas; Seyfarth, André

2016-12-19

Current prosthetic ankle joints are designed either for walking or for running. In order to mimic the capabilities of an able-bodied, a powered prosthetic ankle for walking and running was designed. A powered system has the potential to reduce the limitations in range of motion and positive work output of passive walking and running feet. To perform the experiments a controller capable of transitions between standing, walking, and running with speed adaptations was developed. In the first case study the system was mounted on an ankle bypass in parallel with the foot of a non-amputee subject. By this method the functionality of hardware and controller was proven. The Walk-Run ankle was capable of mimicking desired torque and angle trajectories in walking and running up to 2.6 m/s. At 4 m/s running, ankle angle could be matched while ankle torque could not. Limited ankle output power resulting from a suboptimal spring stiffness value was identified as a main reason. Further studies have to show to what extent the findings can be transferred to amputees.
Argonne Simulation Framework for Intelligent Transportation Systems

DOT National Transportation Integrated Search

1996-01-01

A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...
ATLAS Distributed Computing Monitoring tools during the LHC Run I

NASA Astrophysics Data System (ADS)

Schovancová, J.; Campana, S.; Di Girolamo, A.; Jézéquel, S.; Ueda, I.; Wenaus, T.; Atlas Collaboration

2014-06-01

This contribution summarizes evolution of the ATLAS Distributed Computing (ADC) Monitoring project during the LHC Run I. The ADC Monitoring targets at the three groups of customers: ADC Operations team to early identify malfunctions and escalate issues to an activity or a service expert, ATLAS national contacts and sites for the real-time monitoring and long-term measurement of the performance of the provided computing resources, and the ATLAS Management for long-term trends and accounting information about the ATLAS Distributed Computing resources. During the LHC Run I a significant development effort has been invested in standardization of the monitoring and accounting applications in order to provide extensive monitoring and accounting suite. ADC Monitoring applications separate the data layer and the visualization layer. The data layer exposes data in a predefined format. The visualization layer is designed bearing in mind visual identity of the provided graphical elements, and re-usability of the visualization bits across the different tools. A rich family of various filtering and searching options enhancing available user interfaces comes naturally with the data and visualization layer separation. With a variety of reliable monitoring data accessible through standardized interfaces, the possibility of automating actions under well defined conditions correlating multiple data sources has become feasible. In this contribution we discuss also about the automated exclusion of degraded resources and their automated recovery in various activities.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Terascale Cluster for Advanced Turbulent Combustion Simulations

DTIC Science & Technology

2008-07-25

the system We have given the name CATS (for Combustion And Turbulence Simulator) to the terascale system that was obtained through this grant. CATS ...lnfiniBand interconnect. CATS includes an interactive login node and a file server, each holding in excess of 1 terabyte of file storage. The 35 active...compute nodes of CATS enable us to run up to 140-core parallel MPI batch jobs; one node is reserved to run the scheduler. CATS is operated and

Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS), Version 5.0 (User’s Guide)

DTIC Science & Technology

2010-03-30

provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320--10...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC
Demonstration and Commercialization of the Sediment Ecosystem Assessment Protocol (SEAP)

DTIC Science & Technology

2017-07-09

undergone severe erosion (Peeling 1975). Zuniga Jetty, which runs parallel to Point Loma at the bay’s inlet, was built to control erosion near the inlet...consistent conditions and level of effort required to run the tests. A per site unit cost is less amenable to a field-based deployment, given the many...support in situ tetsing: 1) a standard exposure of spores to a reference toxicant dilutuion series; and 2) exposure of sporophyll blades to a
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed

Madhyastha, Tara M; Koh, Natalie; Day, Trevor K M; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J; Rajan, Sabreena; Woelfer, Karl A; Wolf, Jonathan; Grabowski, Thomas J

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows "in the cloud." Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.
Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.; Stevens, K.

1984-01-01

Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
Failure in laboratory fault models in triaxial tests

USGS Publications Warehouse

Savage, J.C.; Lockner, D.A.; Byerlee, J.D.

1996-01-01

A model of a fault in the Earth is a sand-filled saw cut in a granite cylinder subjected to a triaxial test. The saw cut is inclined at an angle a to the cylinder axis, and the sand filling is intended to represent gouge. The triaxial test subjects the granite cylinder to a constant confining pressure and increasing axial stress to maintain a constant rate of shortening of the cylinder. The required axial stress increases at a decreasing rate to a maximum, beyond which a roughly constant axial stress is sufficient to maintain the constant rate of shortening: Such triaxial tests were run for saw cuts inclined at angles ?? of 20??, 25??, 30??, 35??, 40??, 45??, and 50?? to the cylinder axis, and the apparent coefficient of friction ??a (ratio of the shear stress to the normal stress, both stresses resolved onto the saw cut) at failure was determined. Subject to the assumption that the observed failure involves slip on Coulomb shears (orientation unspecified), the orientation of the principal compression axis within the gouge can be calculated as a function of ??a for a given value of the coefficient of internal friction ??i. The rotation of the principal stress axes within the gouge in a triaxial test can then be followed as the shear strain across the gouge layer increases. For ??i ??? 0.8, an appropriate value for highly sheared sand, the observed values ??a imply that the principal-axis of compression within the gouge rotates so as to approach being parallel to the cylinder axis for all saw cut angles (20?? < ?? < 50??). In the limiting state (principal compression axis parallel to cylinder axis) the stress state in the gouge layer would be the same as that in the granite cylinder, and the failure criterion would be independent of the saw cut angle.
DNA Assembly with De Bruijn Graphs Using an FPGA Platform.

PubMed

Poirier, Carl; Gosselin, Benoit; Fortier, Paul

2018-01-01

This paper presents an FPGA implementation of a DNA assembly algorithm, called Ray, initially developed to run on parallel CPUs. The OpenCL language is used and the focus is placed on modifying and optimizing the original algorithm to better suit the new parallelization tool and the radically different hardware architecture. The results show that the execution time is roughly one fourth that of the CPU and factoring energy consumption yields a tenfold savings.
Multiprocessor graphics computation and display using transputers

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

A package of two-dimensional graphics routines was developed to run on a transputer-based parallel processing system. These routines were designed to enable applications programmers to easily generate and display results from the transputer network in a graphic format. The graphics procedures were designed for the lowest possible network communication overhead for increased performance. The routines were designed for ease of use and to present an intuitive approach to generating graphics on the transputer parallel processing system.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser-induced fluorescence detection.

PubMed

Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R; Seliskar, Carl J; Limbach, Patrick A; Heineman, William R

2010-08-01

Parallel separations using CE on a multilane microchip with multiplexed LIF detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be determined in parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pK(a) determination of small molecule analytes is demonstrated with the multilane microchip.
Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection

PubMed Central

Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R.; Seliskar, Carl J.; Limbach, Patrick A.; Heineman, William R.

2010-01-01

Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be analyzed on parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pKa determination of small molecule analytes is demonstrated with the multilane microchip. PMID:20737446
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Simulation of ozone production in a complex circulation region using nested grids

NASA Astrophysics Data System (ADS)

Taghavi, M.; Cautenet, S.; Foret, G.

2003-07-01

During ESCOMPTE precampaign (15 June to 10 July 2000), three days of intensive pollution (IOP0) have been observed and simulated. The comprehensive RAMS model, version 4.3, coupled online with a chemical module including 29 species, has been used to follow the chemistry of the zone polluted over southern France. This online method can be used because the code is paralleled and the SGI 3800 computer is very powerful. Two runs have been performed: run1 with one grid and run2 with two nested grids. The redistribution of simulated chemical species (ozone, carbon monoxide, sulphur dioxide and nitrogen oxides) was compared to aircraft measurements and surface stations. The 2-grid run has given substantially better results than the one-grid run only because the former takes the outer pollutants into account. This online method helps to explain dynamics and to retrieve the chemical species redistribution with a good agreement.
Simulation of ozone production in a complex circulation region using nested grids

NASA Astrophysics Data System (ADS)

Taghavi, M.; Cautenet, S.; Foret, G.

2004-06-01

During the ESCOMPTE precampaign (summer 2000, over Southern France), a 3-day period of intensive observation (IOP0), associated with ozone peaks, has been simulated. The comprehensive RAMS model, version 4.3, coupled on-line with a chemical module including 29 species, is used to follow the chemistry of the polluted zone. This efficient but time consuming method can be used because the code is installed on a parallel computer, the SGI 3800. Two runs are performed: run 1 with a single grid and run 2 with two nested grids. The simulated fields of ozone, carbon monoxide, nitrogen oxides and sulfur dioxide are compared with aircraft and surface station measurements. The 2-grid run looks substantially better than the run with one grid because the former takes the outer pollutants into account. This on-line method helps to satisfactorily retrieve the chemical species redistribution and to explain the impact of dynamics on this redistribution.
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
A new scheme for the parameterization of the turbulent planetary boundary layer in the GLAS fourth order GCM

NASA Technical Reports Server (NTRS)

Helfand, H. M.

1985-01-01

Methods being used to increase the horizontal and vertical resolution and to implement more sophisticated parameterization schemes for general circulation models (GCM) run on newer, more powerful computers are described. Attention is focused on the NASA-Goddard Laboratory for Atmospherics fourth order GCM. A new planetary boundary layer (PBL) model has been developed which features explicit resolution of two or more layers. Numerical models are presented for parameterizing the turbulent vertical heat, momentum and moisture fluxes at the earth's surface and between the layers in the PBL model. An extended Monin-Obhukov similarity scheme is applied to express the relationships between the lowest levels of the GCM and the surface fluxes. On-line weather prediction experiments are to be run to test the effects of the higher resolution thereby obtained for dynamic atmospheric processes.
Utility of NCEP Operational and Emerging Meteorological Models for Driving Air Quality Prediction

NASA Astrophysics Data System (ADS)

McQueen, J.; Huang, J.; Huang, H. C.; Shafran, P.; Lee, P.; Pan, L.; Sleinkofer, A. M.; Stajner, I.; Upadhayay, S.; Tallapragada, V.

2017-12-01

Operational air quality predictions for the United States (U. S.) are provided at NOAA by the National Air Quality Forecasting Capability (NAQFC). NAQFC provides nationwide operational predictions of ozone and particulate matter twice per day (at 06 and 12 UTC cycles) at 12 km resolution and 1 hour time intervals through 48 hours and distributed at http://airquality.weather.gov. The NOAA National Centers for Environmental Prediction (NCEP) operational North American Mesoscale (NAM) 12 km weather prediction is used to drive the Community Multiscale Air Quality (CMAQ) model. In 2017, the NAM was upgraded in part to reduce a warm 2m temperature bias in Summer (V4). At the same time CMAQ was updated to V5.0.2. Both versions of the models were run in parallel for several months. Therefore the impact of improvements from the atmospheric chemistry model versus upgrades with the weather prediction model could be assessed. . Improvements to CMAQ were related to improvements to improvements in NAM 2 m temperature bias through increasing the opacity of clouds and reducing downward shortwave radiation resulted in reduced ozone photolysis. Higher resolution operational NWP models have recently been introduced as part of the NCEP modeling suite. These include the NAM CONUS Nest (3 km horizontal resolution) run four times per day through 60 hours and the High Resolution Rapid Refresh (HRRR, 3 km) run hourly out to 18 hours. In addition, NCEP with other NOAA labs has begun to develop and test the Next Generation Global Prediction System (NGGPS) based on the FV3 global model. This presentation also overviews recent developments with operational numerical weather prediction and evaluates the ability of these models for predicting low level temperatures, clouds and capturing boundary layer processes important for driving air quality prediction in complex terrain. The assessed meteorological model errors could help determine the magnitude of possible pollutant errors from CMAQ if used for driving meteorology. The NWP models will be evaluated against standard and mesonet fields averaged for various regions during the summer 2017. An evaluation of meteorological fields important to air quality modeling (eg: near surface winds, temperatures, moisture and boundary layer heights, cloud cover) will be reported on.
Comparative evaluation of the liver in dogs with a splenic mass by using ultrasonography and contrast-enhanced computed tomography

PubMed Central

Irausquin, Roelof A.; Scavelli, Thomas D.; Corti, Lisa; Stefanacci, Joseph D.; DeMarco, Joann; Flood, Shannon; Rohrbach, Barton W.

2008-01-01

Evaluation of dogs with splenic masses to better educate owners as to the extent of the disease is a goal of many research studies. We compared the use of ultrasonography (US) and contrast-enhanced computed tomography (CT) to evaluate the accuracy of detecting hepatic neoplasia in dogs with splenic masses, independently, in series, or in parallel. No significant difference was found between US and CT. If the presence or absence of ascites, as detected with US, was used as a pretest probability of disease in our population, the positive predictive value increased to 94% if the tests were run in series, and the negative predictive value increased to 95% if the tests were run in parallel. The study showed that CT combined with US could be a valuable tool in evaluation of dogs with splenic masses. PMID:18320977
Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

DOEpatents

Gara, Alan; Ohmacht, Martin

2014-09-16

In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

Parallel adaptive discontinuous Galerkin approximation for thin layer avalanche modeling

NASA Astrophysics Data System (ADS)

Patra, A. K.; Nichita, C. C.; Bauer, A. C.; Pitman, E. B.; Bursik, M.; Sheridan, M. F.

2006-08-01

This paper describes the development of highly accurate adaptive discontinuous Galerkin schemes for the solution of the equations arising from a thin layer type model of debris flows. Such flows have wide applicability in the analysis of avalanches induced by many natural calamities, e.g. volcanoes, earthquakes, etc. These schemes are coupled with special parallel solution methodologies to produce a simulation tool capable of very high-order numerical accuracy. The methodology successfully replicates cold rock avalanches at Mount Rainier, Washington and hot volcanic particulate flows at Colima Volcano, Mexico.
Synthesis and crystal structure of a novel pentaborate, Na{sub 3}ZnB{sub 5}O{sub 10}

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen Xuean; Li Ming; Chang Xinan

A novel ternary borate, trisodium zinc pentaborate, Na{sub 3}ZnB{sub 5}O{sub 10}, has been prepared by solid-state reaction at temperature below 750deg. C. The single-crystal X-ray structural analysis showed that Na{sub 3}ZnB{sub 5}O{sub 10} crystallizes in the monoclinic space group P2{sub 1}/n with a=6.6725(7)A, b=18.1730(10)A, c=7.8656(9)A, {beta}=114.604(6){sup o}, Z=4. It represents a new structure type in which double ring [B{sub 5}O{sub 10}]{sup 5-} building units are bridged by ZnO{sub 4} tetrahedra through common O atoms to form a two-dimensional {sub {approx}}{sup 2}[ZnB{sub 5}O{sub 10}]{sup 3-}-layer that affords one-dimensional channels running parallel to the [101] direction. Symmetry-center related {sub {approx}}{sup 2}[ZnB{sub 5}O{submore » 10}]{sup 3-} layers are stacked along the b-axis, with the interlayer void spaces and intralayer open channels filled by Na{sup +} cations to balance charge. The IR spectrum further confirms the presence of both BO{sub 3} and BO{sub 4} groups and UV-vis diffuse reflectance spectrum shows a band gap of about 3.2eV.« less
How is a giant sperm ejaculated? Anatomy and function of the sperm pump, or "Zenker organ," in Pseudocandona marchica (Crustacea, Ostracoda, Candonidae)

NASA Astrophysics Data System (ADS)

Yamada, Shinnosuke; Matzke-Karasz, Renate

2012-07-01

`Giant sperm', in terms of exceptionally long spermatozoa, occur in a variety of taxa in the animal kingdom, predominantly in arthropod groups, but also in flatworms, mollusks, and others. In some freshwater ostracods (Cypridoidea), filamentous sperm cells reach up to ten times the animal's body length; nonetheless, during a single copulation several dozen sperm cells can be transferred to the female's seminal receptacle. This highly effective ejaculation has traditionally been credited to a chitinous-muscular structure within the seminal duct, which has been interpreted as a sperm pump. We investigated this organ, also known as the Zenker organ, of a cypridoid ostracod, Pseudocandona marchica, utilizing light and electron microscope techniques and produced a three-dimensional reconstruction based on serial semi-thin histological sections. This paper shows that numerous muscle fibers surround the central tube of the Zenker organ, running in parallel with the central tube and that a thin cellular layer underlies the muscular layer. A cellular inner tube exists inside the central tube. A chitinous-cellular structure at the entrance of the organ has been recognized as an ejaculatory valve. In male specimens during copulation, we confirmed a small hole derived from the passage of a single spermatozoon through the valve. The new data allowed for proposing a detailed course of operation of the Zenker organ during giant sperm ejaculation.
"Submesoscale Soup" Vorticity and Tracer Statistics During the Lateral Mixing Experiment

NASA Astrophysics Data System (ADS)

Shcherbina, A.; D'Asaro, E. A.; Lee, C. M.; Molemaker, J.; McWilliams, J. C.

2012-12-01

A detailed view of upper-ocean velocity, vorticity, and tracer statistics was obtained by a unique synchronized two-vessel survey in the North Atlantic in winter 2012. In winter, North Atlantic Mode water region south of the Gulf Stream is filled with an energetic, homogeneous, and well-developed submesoscale turbulence field - the "submesoscale soup". Turbulence in the soup is produced by frontogenesis and the surface layer instability of mesoscale eddy flows in the vicinity of the Gulf Stream. This region is a convenient representation of the inertial range of the geophysical turbulence forward cascade spanning scales of o(1-100km). During the Lateral Mixing Experiment in February-March 2012, R/Vs Atlantis and Knorr were run on parallel tracks 1 km apart for 500 km in the submesoscale soup region. Synchronous ADCP sampling provided the first in-situ estimates of full 3-D vorticity and divergence without the usual mix of spatial and temporal aliasing. Tracer distributions were also simultaneously sampled by both vessels using the underway and towed instrumentation. Observed vorticity distribution in the mixed layer was markedly asymmetric, with sparse strands of strong anticyclonic vorticity embedded in a weak, predominantly cyclonic background. While the mean vorticity was close to zero, distribution skewness exceeded 2. These observations confirm theoretical and numerical model predictions for an active submesoscale turbulence field. Submesoscale vorticity spectra also agreed well with the model prediction.
Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

NASA Astrophysics Data System (ADS)

Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

2017-07-01

Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.
Quantitative and qualitative measure of intralaboratory two-dimensional protein gel reproducibility and the effects of sample preparation, sample load, and image analysis.

PubMed

Choe, Leila H; Lee, Kelvin H

2003-10-01

We investigate one approach to assess the quantitative variability in two-dimensional gel electrophoresis (2-DE) separations based on gel-to-gel variability, sample preparation variability, sample load differences, and the effect of automation on image analysis. We observe that 95% of spots present in three out of four replicate gels exhibit less than a 0.52 coefficient of variation (CV) in fluorescent stain intensity (% volume) for a single sample run on multiple gels. When four parallel sample preparations are performed, this value increases to 0.57. We do not observe any significant change in quantitative value for an increase or decrease in sample load of 30% when using appropriate image analysis variables. Increasing use of automation, while necessary in modern 2-DE experiments, does change the observed level of quantitative and qualitative variability among replicate gels. The number of spots that change qualitatively for a single sample run in parallel varies from a CV = 0.03 for fully manual analysis to CV = 0.20 for a fully automated analysis. We present a systematic method by which a single laboratory can measure gel-to-gel variability using only three gel runs.
Beyond the single-file fluid limit using transfer matrix method: Exact results for confined parallel hard squares

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gurin, Péter; Varga, Szabolcs

2015-06-14

We extend the transfer matrix method of one-dimensional hard core fluids placed between confining walls for that case where the particles can pass each other and at most two layers can form. We derive an eigenvalue equation for a quasi-one-dimensional system of hard squares confined between two parallel walls, where the pore width is between σ and 3σ (σ is the side length of the square). The exact equation of state and the nearest neighbor distribution functions show three different structures: a fluid phase with one layer, a fluid phase with two layers, and a solid-like structure where the fluidmore » layers are strongly correlated. The structural transition between differently ordered fluids develops continuously with increasing density, i.e., no thermodynamic phase transition occurs. The high density structure of the system consists of clusters with two layers which are broken with particles staying in the middle of the pore.« less
Compute Server Performance Results

NASA Technical Reports Server (NTRS)

Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)

1994-01-01

Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
PREMER: a Tool to Infer Biological Networks.

PubMed

Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

2017-10-04

Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).
pFlogger: The Parallel Fortran Logging Utility

NASA Technical Reports Server (NTRS)

Clune, Tom; Cruz, Carlos A.

2017-01-01

In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or 'logger)' similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger - a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Evaluation of Job Queuing/Scheduling Software: Phase I Report

NASA Technical Reports Server (NTRS)

Jones, James Patton

1996-01-01

The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Parallelization of the FLAPW method and comparison with the PPW method

NASA Astrophysics Data System (ADS)

Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

2000-03-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

Jones, Mark Howard

1987-01-01

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Bi-directional series-parallel elastic actuator and overlap of the actuation layers.

PubMed

Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram

2016-01-27

Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator.
Document Image Parsing and Understanding using Neuromorphic Architecture

DTIC Science & Technology

2015-03-01

processing speed at different layers. In the pattern matching layer, the computing power of multicore processors is explored to reduce the processing...developed to reduce the processing speed at different layers. In the pattern matching layer, the computing power of multicore processors is explored... cortex where the complex data is reduced to abstract representations. The abstract representation is compared to stored patterns in massively parallel
Photocapacitive image converter

NASA Technical Reports Server (NTRS)

Miller, W. E.; Sher, A.; Tsuo, Y. H. (Inventor)

1982-01-01

An apparatus for converting a radiant energy image into corresponding electrical signals including an image converter is described. The image converter includes a substrate of semiconductor material, an insulating layer on the front surface of the substrate, and an electrical contact on the back surface of the substrate. A first series of parallel transparent conductive stripes is on the insulating layer with a processing circuit connected to each of the conductive stripes for detecting the modulated voltages generated thereon. In a first embodiment of the invention, a modulated light stripe perpendicular to the conductive stripes scans the image converter. In a second embodiment a second insulating layer is deposited over the conductive stripes and a second series of parallel transparent conductive stripes perpendicular to the first series is on the second insulating layer. A different frequency current signal is applied to each of the second series of conductive stripes and a modulated image is applied to the image converter.
High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
6. Aerial view of turnpike alignment running from lower left ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

6. Aerial view of turnpike alignment running from lower left diagonally up to right along row of trees. Migel Estate and Farm buildings (HABS No. NY-6356) located at lower right of photograph. W.K. Smith house (HABS No. NY-6356-A) located within clump of trees at lower center, with poultry houses (HABS No. NY-6356-F and G) visible left of the clump of trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

4. Aerial view of turnpike path running through center of ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

4. Aerial view of turnpike path running through center of photograph along row of trees. South edge of original alignment visible at left at cluster of white trailers. North edge of original alignment visible at right at the W.K. Smith house (HABS No. NY-6356-A) at the top right corner. Migel mansion visible on ridgetop at right-center of photograph, surrounded by trees. View looking west. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, Aidan P.

2015-03-01

The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.
User’s Guide for the Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS) Version 5.0

DTIC Science & Technology

2010-03-30

provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC
Early Detection Of Failure Mechanisms In Resilient Biostructures: A Network Flow Study

DTIC Science & Technology

2017-10-01

of flat blades of solid cartilage (sawfishes and some sharks) or simple tubes of bone (swordfish, marlin, etc.) and do not vary appreciably in size...cartilage The hard cartilage is formed by two flat sections that are almost parallel to each other and run along the longitudinal axis of the rostrum...rostrum subjected to a uniform pressure: soft cartilage The soft cartilage is located at the center of the rostrum and runs in the longitudinal Z
FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

PubMed

Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

2018-06-08

Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
Development of the GEM-MACH-FireWork System: An Air Quality Model with On-line Wildfire Emissions within the Canadian Operational Air Quality Forecast System

NASA Astrophysics Data System (ADS)

Pavlovic, Radenko; Chen, Jack; Beaulieu, Paul-Andre; Anselmp, David; Gravel, Sylvie; Moran, Mike; Menard, Sylvain; Davignon, Didier

2014-05-01

A wildfire emissions processing system has been developed to incorporate near-real-time emissions from wildfires and large prescribed burns into Environment Canada's real-time GEM-MACH air quality (AQ) forecast system. Since the GEM-MACH forecast domain covers Canada and most of the U.S.A., including Alaska, fire location information is needed for both of these large countries. During AQ model runs, emissions from individual fire sources are injected into elevated model layers based on plume-rise calculations and then transport and chemistry calculations are performed. This "on the fly" approach to the insertion of the fire emissions provides flexibility and efficiency since on-line meteorology is used and computational overhead in emissions pre-processing is reduced. GEM-MACH-FireWork, an experimental wildfire version of GEM-MACH, was run in real-time mode for the summers of 2012 and 2013 in parallel with the normal operational version. 48-hour forecasts were generated every 12 hours (at 00 and 12 UTC). Noticeable improvements in the AQ forecasts for PM2.5 were seen in numerous regions where fire activity was high. Case studies evaluating model performance for specific regions and computed objective scores will be included in this presentation. Using the lessons learned from the last two summers, Environment Canada will continue to work towards the goal of incorporating near-real-time intermittent wildfire emissions into the operational air quality forecast system.
NAS Parallel Benchmarks. 2.4

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Terry R

2011-01-01

This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach

NASA Technical Reports Server (NTRS)

Mak, Victor W. K.

1986-01-01

Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
Hairpin vortices in turbulent boundary layers

NASA Astrophysics Data System (ADS)

Eitel-Amor, G.; Örlü, R.; Schlatter, P.; Flores, O.

2015-02-01

The present work presents a number of parallel and spatially developing simulations of boundary layers to address the question of whether hairpin vortices are a dominant feature of near-wall turbulence, and which role they play during transition. In the first part, the parent-offspring regeneration mechanism is investigated in parallel (temporal) simulations of a single hairpin vortex introduced in a mean shear flow corresponding to either turbulent channels or boundary layers (Reτ ≲ 590). The effect of a turbulent background superimposed on the mean flow is considered by using an eddy viscosity computed from resolved simulations. Tracking the vortical structure downstream, it is found that secondary hairpins are only created shortly after initialization, with all rotational structures decaying for later times. For hairpins in a clean (laminar) environment, the decay is relatively slow, while hairpins in weak turbulent environments (10% of νt) dissipate after a couple of eddy turnover times. In the second part, the role of hairpin vortices in laminar-turbulent transition is studied using simulations of spatial boundary layers tripped by hairpin vortices. These vortices are generated by means of specific volumetric forces representing an ejection event, creating a synthetic turbulent boundary layer initially dominated by hairpin-like vortices. These hairpins are advected towards the wake region of the boundary layer, while a sinusoidal instability of the streaks near the wall results in rapid development of a turbulent boundary layer. For Reθ > 400, the boundary layer is fully developed, with no evidence of hairpin vortices reaching into the wall region. The results from both the parallel and spatial simulations strongly suggest that the regeneration process is rather short-lived and may not sustain once a turbulent background is developed. From the transitional flow simulations, it is conjectured that the forest of hairpins reported in former direct numerical simulation studies is reminiscent of the transitional boundary layer and may not be connected to some aspects of the dynamics of the fully developed wall-bounded turbulence.
Valles Marineris as a Cryokarstic Structure Formed by a Giant Dyke System: Support From New Analogue Experiments

NASA Astrophysics Data System (ADS)

Ozeren, M. S.; Sengor, A. M. C.; Acar, D.; Ülgen, S. C.; Onsel, I. E.

2014-12-01

Valles Marineris is the most significant near-linear depression on Mars. It is some 4000 km long, up to about 200 km wide and some 7 km deep. Although its margins look parallel at first sight, the entire structure has a long spindle shape with significant enlargement in its middle (Melas Chasma) caused by cuspate slope retreat mechanisms. Farther to its north is Hebes Chasma which is an entirely closed depression with a more pronounced spindle shape. Tithonium Chasma is a parallel, but much narrower depression to its northeast. All these chasmae have axes parallel with one another and such structures occur nowhere else on Mars. A scabland surface exists to the east of the Valles Marineris and the causative water mass seems to have issued from it. The great resemblance of these chasmae on mars to poljes in the karstic regions on earth have led us to assume that they owed their existence to dissolution of rock layers underlying them. We assumed that the dissolving layer consisted of water ice forming substantial layers, in fact entirely frozen seas of several km depth. We have simulated this geometry by using bentonite and flour layers (in different experiments) overlying layers of ice in which a resistant coil was used to simulate a dyke. We used different thicknesses of bentonite and flour overlying ice layers again of various thicknesses. The flour seems to simulate the Martian crust better because on Mars, g is only about 3/8ths of its value on Earth, so (for equal crustal density) the depth to which the cohesion term C remains important in the Mohr-Coulomb shear failure criterion is about 8/3 times greater. As examples we show two of those experiments in which both the rock analogue and ice layers were of 1.5 cm. thick. Perfect analogues of the Valles Marineris formed above the dyke analogue thermal source complete with the near-linear structure, overall flat spindle shape, cuspate margins, a central ridge, parallel side faults, parallel depressions resembling the Tithonium Chasma. When water was allowed to drain from the beginning, closed depressions formed that have an amazing resemblance to Hebes chasma. We postulate that the entire system of chasmae here discussed formed atop a major dyke swarm some 4000 km length, not dissimilar to the 3500 km long Mesoproterozoic (Ectasian) dyke swarm disrupting the Canadian Shield.
Secure web-based invocation of large-scale plasma simulation codes

NASA Astrophysics Data System (ADS)

Dimitrov, D. A.; Busby, R.; Exby, J.; Bruhwiler, D. L.; Cary, J. R.

2004-12-01

We present our design and initial implementation of a web-based system for running, both in parallel and serial, Particle-In-Cell (PIC) codes for plasma simulations with automatic post processing and generation of visual diagnostics.
78 FR 9687 - Prineville Energy Storage, LLC; Notice of Preliminary Permit Application Accepted for Filing and...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-11

... PDCI to the Ponderosa substation, or (ii) the Bonneville Power Administration (BPA) existing transmission line corridor and then running parallel to the BPA line to the Ponderosa substation; and (9...
Exploration of ground instability factor causing slumping and related dewatering in high methane-flux and gentle continental slope off Shimokita Peninsula, NE Japan

NASA Astrophysics Data System (ADS)

Morita, S.; Nakajima, T.; Goto, S.; Yamada, Y.; Kawamura, K.

2012-12-01

A great number of slump (submarine landslide) units have been identified by reflection seismic surveys performed off Shimokita Peninsula, NE Japan (Morita, et al., 2011). A 3-D seismic data revealed typical deformations caused by slumping and related dewatering in the Pliocene and upper formations. The slumping was generated primarily by layer-parallel slip in a very gentle (<1 degree) and flat continental slope. The size of slump units extends over 30 km in both width and slip direction in maximum. The slump units often exhibit an imbrication structure formed by repeated thrusting in the bottom layers, being mostly composed of the thrust blocks and little matrix. The dewatering structure is observed as widespread parallel dikes of which distribution is strongly dependent on the imbrication of the slump units. Slip planes of the slumps are traceable in seismic data because of the layer-parallel slip. The layers which correspond to the slip planes proved to be generally characterized as low-amplitude layers having some thickness, and some of the slip planes exhibit flattened features under the slump units of the imbrication structure accompanied by parallel dikes. This implies that excess fluid in the slip plane caused the lubrication to enhance the slumping and was drained through the parallel dikes during slumping. Some typical structures related to natural gas, e.g. enhanced reflection, gas chimney, have been identified in the seismic data. The shakedown cruise of D/V Chikyu in 2006 reported a recovery of gas hydrate in nearby area (Higuchi et al., 2009). A shallow sulfate-methane interface (SMI) of 3.5-12 mbsf has been reported in the survey area (Kotani et al., 2007). These features indicate that a high methane flux in the area is likely an important ground instability factor to cause the slumping and the dewatering phenomena. We recognize that the set of the slump units in the survey area is one of the most suitable targets to approach mechanism of submarine landslides so that we started exploring the feasibility of a scientific drilling in this survey area.
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Investigation of the effect of resistivity on scrape off layer filaments using three-dimensional simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Easy, L., E-mail: le590@york.ac.uk; CCFE, Culham Science Centre, Abingdon OX14 3DB; Militello, F.

2016-01-15

The propagation of filaments in the Scrape Off Layer (SOL) of tokamaks largely determines the plasma profiles in the region. In a conduction limited SOL, parallel temperature gradients are expected, such that the resistance to parallel currents is greater at the target than further upstream. Since the perpendicular motion of an isolated filament is largely determined by balance of currents that flow through it, this may be expected to affect filament transport. 3D simulations have thus been used to study the influence of enhanced parallel resistivity on the dynamics of filaments. Filaments with the smallest perpendicular length scales, which weremore » inertially limited at low resistivity (meaning that polarization rather than parallel currents determines their radial velocities), were unaffected by resistivity. For larger filaments, faster velocities were produced at higher resistivities due to two mechanisms. First parallel currents were reduced and polarization currents were enhanced, meaning that the inertial regime extended to larger filaments, and second, a potential difference formed along the parallel direction so that higher potentials were produced in the region of the filament for the same amount of current to flow into the sheath. These results indicate that broader SOL profiles could be produced at higher resistivities.« less
Numerical simulations of Hurricane Katrina (2005) in the turbulent gray zone

NASA Astrophysics Data System (ADS)

Green, Benjamin W.; Zhang, Fuqing

2015-03-01

Current numerical simulations of tropical cyclones (TCs) use a horizontal grid spacing as small as Δx = 103 m, with all boundary layer (BL) turbulence parameterized. Eventually, TC simulations can be conducted at Large Eddy Simulation (LES) resolution, which requires Δx to fall in the inertial subrange (often <102 m) to adequately resolve the large, energy-containing eddies. Between the two lies the so-called "terra incognita" because some of the assumptions used by mesoscale models and LES to treat BL turbulence are invalid. This study performs several 4-6 h simulations of Hurricane Katrina (2005) without a BL parameterization at extremely fine Δx [333, 200, and 111 m, hereafter "Large Eddy Permitting (LEP) runs"] and compares with mesoscale simulations with BL parameterizations (Δx = 3 km, 1 km, and 333 m, hereafter "PBL runs"). There are profound differences in the hurricane BL structure between the PBL and LEP runs: the former have a deeper inflow layer and secondary eyewall formation, whereas the latter have a shallow inflow layer without a secondary eyewall. Among the LEP runs, decreased Δx yields weaker subgrid-scale vertical momentum fluxes, but the sum of subgrid-scale and "grid-scale" fluxes remain similar. There is also evidence that the size of the prevalent BL eddies depends upon Δx, suggesting that convergence to true LES has not yet been reached. Nevertheless, the similarities in the storm-scale BL structure among the LEP runs indicate that the net effect of the BL on the rest of the hurricane may be somewhat independent of Δx.
Numerical solution to the glancing sidewall oblique shock wave/turbulent boundary layer interaction in three dimension

NASA Technical Reports Server (NTRS)

Anderson, B. H.; Benson, T. J.

1983-01-01

A supersonic three-dimensional viscous forward-marching computer design code called PEPSIS is used to obtain a numerical solution of the three-dimensional problem of the interaction of a glancing sidewall oblique shock wave and a turbulent boundary layer. Very good results are obtained for a test case that was run to investigate the use of the wall-function boundary-condition approximation for a highly complex three-dimensional shock-boundary layer interaction. Two additional test cases (coarse mesh and medium mesh) are run to examine the question of near-wall resolution when no-slip boundary conditions are applied. A comparison with experimental data shows that the PEPSIS code gives excellent results in general and is practical for three-dimensional supersonic inlet calculations.
Numerical solution to the glancing sidewall oblique shock wave/turbulent boundary layer interaction in three-dimension

NASA Technical Reports Server (NTRS)

Anderson, B. H.; Benson, T. J.

1983-01-01

A supersonic three-dimensional viscous forward-marching computer design code called PEPSIS is used to obtain a numerical solution of the three-dimensional problem of the interaction of a glancing sidewall oblique shock wave and a turbulent boundary layer. Very good results are obtained for a test case that was run to investigate the use of the wall-function boundary-condition approximation for a highly complex three-dimensional shock-boundary layer interaction. Two additional test cases (coarse mesh and medium mesh) are run to examine the question of near-wall resolution when no-slip boundary conditions are applied. A comparison with experimental data shows that the PEPSIS code gives excellent results in general and is practical for three-dimensional supersonic inlet calculations.
Long-time self-diffusion of charged spherical colloidal particles in parallel planar layers.

PubMed

Contreras-Aburto, Claudio; Báez, César A; Méndez-Alcaraz, José M; Castañeda-Priego, Ramón

2014-06-28

The long-time self-diffusion coefficient, D(L), of charged spherical colloidal particles in parallel planar layers is studied by means of Brownian dynamics computer simulations and mode-coupling theory. All particles (regardless which layer they are located on) interact with each other via the screened Coulomb potential and there is no particle transfer between layers. As a result of the geometrical constraint on particle positions, the simulation results show that D(L) is strongly controlled by the separation between layers. On the basis of the so-called contraction of the description formalism [C. Contreras-Aburto, J. M. Méndez-Alcaraz, and R. Castañeda-Priego, J. Chem. Phys. 132, 174111 (2010)], the effective potential between particles in a layer (the so-called observed layer) is obtained from integrating out the degrees of freedom of particles in the remaining layers. We have shown in a previous work that the effective potential performs well in describing the static structure of the observed layer (loc. cit.). In this work, we find that the D(L) values determined from the simulations of the observed layer, where the particles interact via the effective potential, do not agree with the exact values of D(L). Our findings confirm that even when an effective potential can perform well in describing the static properties, there is no guarantee that it will correctly describe the dynamic properties of colloidal systems.

Design of co-existence parallel periodic surface structure induced by picosecond laser pulses on the Al/Ti multilayers

NASA Astrophysics Data System (ADS)

Petrović, Suzana; Peruško, D.; Kovač, J.; Panjan, P.; Mitrić, M.; Pjević, D.; Kovačević, A.; Jelenković, B.

2017-09-01

Formation of periodic nanostructures on the Ti/5x(Al/Ti)/Si multilayers induced by picosecond laser pulses is studied in order to better understand the formation of a laser-induced periodic surface structure (LIPSS). At fluence slightly below the ablation threshold, the formation of low spatial frequency-LIPSS (LSFL) oriented perpendicular to the direction of the laser polarization is observed on the irradiated area. Prolonged irradiation while scanning results in the formation of a high spatial frequency-LIPSS (HSFL), on top of the LSFLs, creating a co-existence parallel periodic structure. HSFL was oriented parallel to the incident laser polarization. Intermixing between the Al and Ti layers with the formation of Al-Ti intermetallic compounds was achieved during the irradiation. The intermetallic region was formed mostly within the heat affected zone of the sample. Surface segregation of aluminium with partial ablation of the top layer of titanium was followed by the formation of an ultra-thin Al2O3 film on the surface of the multi-layered structure.
Sediments from the Boxing Day tsunami on the coasts of southeastern India and Kenya

NASA Astrophysics Data System (ADS)

Weiss, R.; Bahlburg, H.

2006-12-01

On the Boxing Day 2004, the world community experienced a catastrophic tsunami in the Indian Ocean and could also saw how unprepared and unaware countries along the Indian ocean were. Beyond the tragedy of the tremendous loss of lives, the result of this event is an opportunity to study a global tsunami (mega-tsunami) in many regards. Here, we report on tsunami sediments left behind on beaches at the coast of Tamil Nadu (India) and on beaches between Malindi and Lamu (Kenya). Characteristic debris accumulations on the beach surface at Tamil Nadu (India) showed the impact of three tsunami waves. In this area, the tsunami climbed ~5 m up the beach; the last traces of a tsunami wave were found ~580 m away from the shoreline. Palm trees indicated an overland flow depth of 3.5 m, ~50 m from the shoreline. The tsunami deposits were up to 30 cm thick. They had an erosional base to the underlying soil and pre-tsunami beach deposits and were made up of moderately well- to well-sorted coarse and medium sand. The sand sheet thins inland, but without a decrease in grain size. Three distinct layers could be identified within the tsunami deposit. The lower one occasionally displayed cross-bedding with foresets dipping landward indicating deposition during run-up. The two upper layers were graded or parallel-laminated without indicators of flow directions. The boundaries between the different layers were marked by dark laminae, rich in heavy minerals. Also, the presence of benthic foraminifera indicates entrainment of sediment into the water column by the incoming tsunami wave in water depths less than 30 m. On beaches between Malindi and Lamu, Kenya, the traces of only one tsunami wave could be found, which attained a run-up height of about 3 m and traveled ~35 m inland with respect to the tidal stage at tsunami impact. The tsunami sediments consist of one layer of fine sand and are predominantly composed of heavy minerals supplied to the sea by nearby rivers. A slight fining-inland trend could be identified in the thinning- inland sand layer. Benthic foraminifera also indicate an entrainment of sediment by the incoming tsunami wave in a water depth less than 30 m, however there are indications that sediment might be entrained in a water depth of 80 m. The fact that only one sand layer occurs in Kenya as opposed to three at Tamil Nadu might lead to the conclusion that only one wave approached the Kenyan coast. This interpretation is misleading because the Kenyan coast is several thousand kilometers away from source area of the tsunami; the non-linear behavior of the incoming tsunami waves, especially the interaction with the nearby reef, may have resulted in the discovered sedimentologic evidence of the tsunami impact on the Kenyan coast.
Parallel matrix multiplication on the Connection Machine

NASA Technical Reports Server (NTRS)

Tichy, Walter F.

1988-01-01

Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
Parallel simulations of Grover's algorithm for closest match search in neutron monitor data

NASA Astrophysics Data System (ADS)

Kussainov, Arman; White, Yelena

We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.
Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP

NASA Technical Reports Server (NTRS)

Chan, Tony F.; Fatoohi, Rod A.

1990-01-01

The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Boman, Erik G.

This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less
Alpine Fault, New Zealand, SRTM Shaded Relief and Colored Height

NASA Image and Video Library

2005-01-06

The Alpine fault runs parallel to, and just inland of, much of the west coast of New Zealand South Island. This view was created from the near-global digital elevation model produced by NASA Shuttle Radar Topography Mission SRTM.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Evolution of CMS workload management towards multicore job support

NASA Astrophysics Data System (ADS)

Pérez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.; Letts, J.; Majewski, K.; Rodrigues, A. M.; McCrea, A.; Vaandering, E.

2015-12-01

The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single and multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed Central

Madhyastha, Tara M.; Koh, Natalie; Day, Trevor K. M.; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J.; Rajan, Sabreena; Woelfer, Karl A.; Wolf, Jonathan; Grabowski, Thomas J.

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows “in the cloud.” Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster. PMID:29163119
Evolution of CMS Workload Management Towards Multicore Job Support

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.

The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single andmore » multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.« less
Simplified and quick electrical modeling for dye sensitized solar cells: An experimental and theoretical investigation

NASA Astrophysics Data System (ADS)

de Andrade, Rocelito Lopes; de Oliveira, Matheus Costa; Kohlrausch, Emerson Cristofer; Santos, Marcos José Leite

2018-05-01

This work presents a new and simple method for determining IPH (current source dependent on luminance), I0 (reverse saturation current), n (ideality factor), RP and RS, (parallel and series resistance) to build an electrical model for dye sensitized solar cells (DSSCs). The electrical circuit parameters used in the simulation and to generate theoretical curves for the single diode electrical model were extracted from I-V curves of assembled DSSCs. Model validation was performed by assembling five different types of DSSCs and evaluating the following parameters: effect of a TiO2 blocking/adhesive layer, thickness of the TiO2 layer and the presence of a light scattering layer. In addition, irradiance, temperature, series and parallel resistance, ideality factor and reverse saturation current were simulated.
Three-dimensional magnetic bubble memory system

NASA Technical Reports Server (NTRS)

Stadler, Henry L. (Inventor); Katti, Romney R. (Inventor); Wu, Jiin-Chuan (Inventor)

1994-01-01

A compact memory uses magnetic bubble technology for providing data storage. A three-dimensional arrangement, in the form of stacks of magnetic bubble layers, is used to achieve high volumetric storage density. Output tracks are used within each layer to allow data to be accessed uniquely and unambiguously. Storage can be achieved using either current access or field access magnetic bubble technology. Optical sensing via the Faraday effect is used to detect data. Optical sensing facilitates the accessing of data from within the three-dimensional package and lends itself to parallel operation for supporting high data rates and vector and parallel processing.
Parallel inhomogeneity and the Alfven resonance. 1: Open field lines

NASA Technical Reports Server (NTRS)

Hansen, P. J.; Harrold, B. G.

1994-01-01

In light of a recent demonstration of the general nonexistence of a singularity at the Alfven resonance in cold, ideal, linearized magnetohydrodynamics, we examine the effect of a small density gradient parallel to uniform, open ambient magnetic field lines. To lowest order, energy deposition is quantitatively unaffected but occurs continuously over a thickened layer. This effect is illustrated in a numerical analysis of a plasma sheet boundary layer model with perfectly absorbing boundary conditions. Consequences of the results are discussed, both for the open field line approximation and for the ensuing closed field line analysis.
Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Is fault surface roughness indicative of fault mechanisms? Observations from experimental Limestone faults

NASA Astrophysics Data System (ADS)

Sagy, A.; Tesei, T.; Collettini, C.

2016-12-01

Geometrical irregularity of contacting surfaces is a fundamental factor controlling friction and energy dissipation during sliding. We performed direct shear experiments on 20x20 cm limestone surfaces by applying constant normal load (40-200 kN) and sliding velocity 1-300 µm/s. Before shearing, the surfaces were polished with maximal measured amplitudes of less than 0.1 mm. After shear, elongated islands of shear zones are observed, characterized by grooves ploughed into the limestone surfaces and by layers of fine grain wear. These structures indicate that the contact areas during shear are scattered and occupy a limited portion of the entire surface area. The surfaces was scanned by a laser profilometer that measures topography using 640 parallel beams in a single run, offer up to 10 µm accuracy and working ranges of 200 mm. Two distinctive types of topographical end members are defined: rough wavy sections and smooth polished ones. The rough zones display ridges with typical amplitudes of 0.1-1 mm that cross the grooves perpendicular to the slip direction. These features are associated with penetrative brittle damage and with fragmentation. The smoother zones display reflective mirror-like surfaces bordered by topographical sharp steps at heights of 0.3-0.5 mm. These sections are localized inside the wear layer or between the wear layer and the host rock, and are not associated with observed penetrative damage. Preliminary statistical analysis suggests that the roughness of the ridges zones can be characterized using a power-low relationship between profile length and mean roughness, with relatively high values of Hurst exponents (e.g. H > 0.65) parallel to the slip direction. The polished zones, on the other hand, corresponded to lower values of Hurst exponents (e.g. H ≤ 0.6). Both structural and roughness measurements indicate that the distinctive topographic variations on the surfaces reflect competing mechanical processes which occur simultaneously during shear. The wavy ridged zone is the surface expression of penetrative cracking and fragmentation which widen the shear zone, while the smooth zones reflect localized flow and plastic deformation of the wear material. The similarity in topography of shear structures between experimental and natural faults suggests similar mechanical processes.
Radiative instabilities in sheared magnetic field

NASA Technical Reports Server (NTRS)

Drake, J. F.; Sparks, L.; Van Hoven, G.

1988-01-01

The structure and growth rate of the radiative instability in a sheared magnetic field B have been calculated analytically using the Braginskii fluid equations. In a shear layer, temperature and density perturbations are linked by the propagation of sound waves parallel to the local magnetic field. As a consequence, density clumping or condensation plays an important role in driving the instability. Parallel thermal conduction localizes the mode to a narrow layer where K(parallel) is small and stabilizes short wavelengths k larger-than(c) where k(c) depends on the local radiation and conduction rates. Thermal coupling to ions also limits the width of the unstable spectrum. It is shown that a broad spectrum of modes is typically unstable in tokamak edge plasmas and it is argued that this instability is sufficiently robust to drive the large-amplitude density fluctuations often measured there.
Algorithm for computing descriptive statistics for very large data sets and the exa-scale era

NASA Astrophysics Data System (ADS)

Beekman, Izaak

2017-11-01

An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.
Crater Moreux

NASA Technical Reports Server (NTRS)

1997-01-01

Color image of part of the Ismenius Lacus region of Mars (MC-5 quadrangle) containing the impact crater Moreux (right center); north toward top. The scene shows heavily cratered highlands in the south on relatively smooth lowland plains in the north separated by a belt of dissected terrain, containing flat-floored valleys, mesas, and buttes. This image is a composite of Viking medium-resolution images in black and white and low-resolution images in color. The image extends from latitude 36 degrees N. to 50 degrees N. and from longitude 310 degrees to 340 degrees; Lambert conformal conic projection. The dissected terrain along the highlands/lowlands boundary consists of the flat-floored valleys of Deuteronilus Mensae (on left) and Prontonilus Mensae (on right) and farther north the small, rounded hills of knobby terrain. Flows on the mensae floors contain striae that run parallel to valley walls; where valleys meet, the striae merge, similar to medial moraines on glaciers. Terraces within the valley hills have been interpreted as either layered rocks or wave terraces. The knobby terrain has been interpreted as remnants of the old, densely cratered highland terrain perhaps eroded by mass wasting.

Apparatus for precision micromachining with lasers

DOEpatents

Chang, J.J.; Dragon, E.P.; Warner, B.E.

1998-04-28

A new material processing apparatus using a short-pulsed, high-repetition-rate visible laser for precision micromachining utilizes a near diffraction limited laser, a high-speed precision two-axis tilt-mirror for steering the laser beam, an optical system for either focusing or imaging the laser beam on the part, and a part holder that may consist of a cover plate and a back plate. The system is generally useful for precision drilling, cutting, milling and polishing of metals and ceramics, and has broad application in manufacturing precision components. Precision machining has been demonstrated through percussion drilling and trepanning using this system. With a 30 W copper vapor laser running at multi-kHz pulse repetition frequency, straight parallel holes with size varying from 500 microns to less than 25 microns and with aspect ratios up to 1:40 have been consistently drilled with good surface finish on a variety of metals. Micromilling and microdrilling on ceramics using a 250 W copper vapor laser have also been demonstrated with good results. Materialographic sections of machined parts show little (submicron scale) recast layer and heat affected zone. 1 fig.
Apparatus for precision micromachining with lasers

DOEpatents

Chang, Jim J.; Dragon, Ernest P.; Warner, Bruce E.

1998-01-01

A new material processing apparatus using a short-pulsed, high-repetition-rate visible laser for precision micromachining utilizes a near diffraction limited laser, a high-speed precision two-axis tilt-mirror for steering the laser beam, an optical system for either focusing or imaging the laser beam on the part, and a part holder that may consist of a cover plate and a back plate. The system is generally useful for precision drilling, cutting, milling and polishing of metals and ceramics, and has broad application in manufacturing precision components. Precision machining has been demonstrated through percussion drilling and trepanning using this system. With a 30 W copper vapor laser running at multi-kHz pulse repetition frequency, straight parallel holes with size varying from 500 microns to less than 25 microns and with aspect ratios up to 1:40 have been consistently drilled with good surface finish on a variety of metals. Micromilling and microdrilling on ceramics using a 250 W copper vapor laser have also been demonstrated with good results. Materialogroaphic sections of machined parts show little (submicron scale) recast layer and heat affected zone.
Development of the US3D Code for Advanced Compressible and Reacting Flow Simulations

NASA Technical Reports Server (NTRS)

Candler, Graham V.; Johnson, Heath B.; Nompelis, Ioannis; Subbareddy, Pramod K.; Drayna, Travis W.; Gidzak, Vladimyr; Barnhardt, Michael D.

2015-01-01

Aerothermodynamics and hypersonic flows involve complex multi-disciplinary physics, including finite-rate gas-phase kinetics, finite-rate internal energy relaxation, gas-surface interactions with finite-rate oxidation and sublimation, transition to turbulence, large-scale unsteadiness, shock-boundary layer interactions, fluid-structure interactions, and thermal protection system ablation and thermal response. Many of the flows have a large range of length and time scales, requiring large computational grids, implicit time integration, and large solution run times. The University of Minnesota NASA US3D code was designed for the simulation of these complex, highly-coupled flows. It has many of the features of the well-established DPLR code, but uses unstructured grids and has many advanced numerical capabilities and physical models for multi-physics problems. The main capabilities of the code are described, the physical modeling approaches are discussed, the different types of numerical flux functions and time integration approaches are outlined, and the parallelization strategy is overviewed. Comparisons between US3D and the NASA DPLR code are presented, and several advanced simulations are presented to illustrate some of novel features of the code.
Revealing Roosevelt

NASA Technical Reports Server (NTRS)

2006-01-01

This image mosaic from the microscopic imager aboard NASA's Mars Exploration Rover Opportunity shows detailed structure of a small fin-like structure dubbed 'Roosevelt,' which sticks out from the outcrop pavement at the edge of 'Erebus Crater.'
Roosevelt lines a fracture in the local pavement and scientists hypothesize that it is a fracture fill, formed by water that percolated through the fracture. This would mean the feature is younger than surrounding rocks and, therefore, might provide evidence of water that was present some time after the formation of Meridiani Planum sedimentary rocks.
The image shows fine laminations (layers about 1 millimeter or .04 inch thick) that run parallel to the axis of the fin. Some of the textures visible in the image likely indicate that minerals precipitated from the outcrop rocks, but sediment grains are also apparent.
The three frames combined into this mosaic were taken during Opportunity's 727th Martian day, or sol (Feb. 8, 2006). In subsequent days, the rover completed textural and chemical inspection of Roosevelt to help the science team understand this structure's significance for Martian history.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Foltyn, Stephen R; Jia, Quanxi; Arendt, Paul N

A superconducting tape having reduced AC losses. The tape has a high temperature superconductor layer that is segmented. Disruptive strips, formed in one of the tape substrate, a buffer layer, and the superconducting layer create parallel discontinuities in the superconducting layer that separate the current-carrying elements of the superconducting layer into strips or filament-like structures. Segmentation of the current-carrying elements has the effect of reducing AC current losses. Methods of making such a superconducting tape and reducing AC losses in such tapes are also disclosed.
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Turbulent boundary layers with secondary flow

NASA Technical Reports Server (NTRS)

Grushwitz, E.

1984-01-01

An experimental analysis of the boundary layer on a plane wall, along which the flow occurs, whose potential flow lines are curved in plane parallel to the wall is discussed. According to the equation frequently applied to boundary layers in a plane flow, which is usually obtained by using the pulse law, a generalization is derived which is valid for boundary layers with spatial flow. The wall shear stresses were calculated with this equation.
Fenix, A Fault Tolerant Programming Framework for MPI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

2016-10-05

Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
Data intensive computing at Sandia.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, Andrew T.

2010-09-01

Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
1H-Indole-3-carbaldehyde.

PubMed

Dileep, C S; Abdoh, M M M; Chakravarthy, M P; Mohana, K N; Sridhar, M A

2012-11-01

In the title compound, C(9)H(7)NO, the benzene ring forms a dihedral angle of 3.98 (12)° with the pyrrole ring. In the crystal, N-H⋯O hydrogen bonds links the mol-ecules into chains which run parallel to [02-1].
The Relation between Reconnected Flux, the Parallel Electric Field, and the Reconnection Rate in a Three-Dimensional Kinetic Simulation of Magnetic Reconnection

NASA Astrophysics Data System (ADS)

Wendel, D. E.; Olson, D. K.; Hesse, M.; Karimabadi, H.; Daughton, W. S.

2013-12-01

We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection of a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of topological features such as separators and null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a correspondence between the locus of changes in magnetic connectivity, or the quasi-separatrix layer, and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we compare the distribution of parallel electric fields along field lines with the reconnection rate. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field, while the contribution from high amplitude parallel fluctuations, such as electron holes, is negligible. The results impact the determination of reconnection sites within models of 3D turbulent reconnection as well as the inference of reconnection rates from in situ spacecraft measurements. It is difficult through direct observation to isolate the locus of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the partial sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
PARAMO: A Parallel Predictive Modeling Platform for Healthcare Analytic Research using Electronic Health Records

PubMed Central

Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R.; Stewart, Walter F.; Malin, Bradley; Sun, Jimeng

2014-01-01

Objective Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: 1) cohort construction, 2) feature construction, 3) cross-validation, 4) feature selection, and 5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which 1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, 2) schedules the tasks in a topological ordering of the graph, and 3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 hours in parallel compared to 9 days if running sequentially. Conclusion This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. PMID:24370496
PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

PubMed

Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

2014-04-01

Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. Copyright © 2013 Elsevier Inc. All rights reserved.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Stability of Large Parallel Tunnels Excavated in Weak Rocks: A Case Study

NASA Astrophysics Data System (ADS)

Ding, Xiuli; Weng, Yonghong; Zhang, Yuting; Xu, Tangjin; Wang, Tuanle; Rao, Zhiwen; Qi, Zufang

2017-09-01

Diversion tunnels are important structures for hydropower projects but are always placed in locations with less favorable geological conditions than those in which other structures are placed. Because diversion tunnels are usually large and closely spaced, the rock pillar between adjacent tunnels in weak rocks is affected on both sides, and conventional support measures may not be adequate to achieve the required stability. Thus, appropriate reinforcement support measures are needed, and the design philosophy regarding large parallel tunnels in weak rocks should be updated. This paper reports a recent case in which two large parallel diversion tunnels are excavated. The rock masses are thin- to ultra-thin-layered strata coated with phyllitic films, which significantly decrease the soundness and strength of the strata and weaken the rocks. The behaviors of the surrounding rock masses under original (and conventional) support measures are detailed in terms of rock mass deformation, anchor bolt stress, and the extent of the excavation disturbed zone (EDZ), as obtained from safety monitoring and field testing. In situ observed phenomena and their interpretation are also included. The sidewall deformations exhibit significant time-dependent characteristics, and large magnitudes are recorded. The stresses in the anchor bolts are small, but the extents of the EDZs are large. The stability condition under the original support measures is evaluated as poor. To enhance rock mass stability, attempts are made to reinforce support design and improve safety monitoring programs. The main feature of these attempts is the use of prestressed cables that run through the rock pillar between the parallel tunnels. The efficacy of reinforcement support measures is verified by further safety monitoring data and field test results. Numerical analysis is constantly performed during the construction process to provide a useful reference for decision making. The calculated deformations are in good agreement with the measured data, and the calculated forces of newly added cables show that the designed reinforcement is necessary and ensures sufficient stability. Finally, the role of safety monitoring in the evaluation of rock mass stability and the consideration of tunnel group effect are discussed. The work described in this paper aims to deepen the understanding of rock mass behaviors of large parallel tunnels in weak rocks and to improve the design philosophy.
Entropy generation in a parallel-plate active magnetic regenerator with insulator layers

NASA Astrophysics Data System (ADS)

Mugica Guerrero, Ibai; Poncet, Sébastien; Bouchard, Jonathan

2017-02-01

This paper proposes a feasible solution to diminish conduction losses in active magnetic regenerators. Higher performances of these machines are linked to a lower thermal conductivity of the Magneto-Caloric Material (MCM) in the streamwise direction. The concept presented here involves the insertion of insulator layers along the length of a parallel-plate magnetic regenerator in order to reduce the heat conduction within the MCM. This idea is investigated by means of a 1D numerical model. This model solves not only the energy equations for the fluid and solid domains but also the magnetic circuit that conforms the experimental setup of reference. In conclusion, the addition of insulator layers within the MCM increases the temperature span, cooling load, and coefficient of performance by a combination of lower heat conduction losses and an increment of the global Magneto-Caloric Effect. The generated entropy by solid conduction, fluid convection, and conduction and viscous losses are calculated to help understand the implications of introducing insulator layers in magnetic regenerators. Finally, the optimal number of insulator layers is studied.
Dynamical Generation of Quasi-Stationary Alfvenic Double Layers and Charge Holes and Unified Theory of Quasi-Static and Alfvenic Auroral Arc Formation

NASA Astrophysics Data System (ADS)

Song, Y.; Lysak, R. L.

2015-12-01

Parallel E-fields play a crucial role for the acceleration of charged particles, creating discrete aurorae. However, once the parallel electric fields are produced, they will disappear right away, unless the electric fields can be continuously generated and sustained for a fairly long time. Thus, the crucial question in auroral physics is how to generate such a powerful and self-sustained parallel electric fields which can effectively accelerate charge particles to high energy during a fairly long time. We propose that nonlinear interaction of incident and reflected Alfven wave packets in inhomogeneous auroral acceleration region can produce quasi-stationary non-propagating electromagnetic plasma structures, such as Alfvenic double layers (DLs) and Charge Holes. Such Alfvenic quasi-static structures often constitute powerful high energy particle accelerators. The Alfvenic DL consists of localized self-sustained powerful electrostatic electric fields nested in a low density cavity and surrounded by enhanced magnetic and mechanical stresses. The enhanced magnetic and velocity fields carrying the free energy serve as a local dynamo, which continuously create the electrostatic parallel electric field for a fairly long time. The generated parallel electric fields will deepen the seed low density cavity, which then further quickly boosts the stronger parallel electric fields creating both Alfvenic and quasi-static discrete aurorae. The parallel electrostatic electric field can also cause ion outflow, perpendicular ion acceleration and heating, and may excite Auroral Kilometric Radiation.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.

PubMed

Kunkel, Susanne; Schenck, Wolfram

2017-01-01

NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code

PubMed Central

Kunkel, Susanne; Schenck, Wolfram

2017-01-01

NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation.

PubMed

Eastman, Peter; Friedrichs, Mark S; Chodera, John D; Radmer, Randall J; Bruns, Christopher M; Ku, Joy P; Beauchamp, Kyle A; Lane, Thomas J; Wang, Lee-Ping; Shukla, Diwakar; Tye, Tony; Houston, Mike; Stich, Timo; Klein, Christoph; Shirts, Michael R; Pande, Vijay S

2013-01-08

OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added.

OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation

PubMed Central

Eastman, Peter; Friedrichs, Mark S.; Chodera, John D.; Radmer, Randall J.; Bruns, Christopher M.; Ku, Joy P.; Beauchamp, Kyle A.; Lane, Thomas J.; Wang, Lee-Ping; Shukla, Diwakar; Tye, Tony; Houston, Mike; Stich, Timo; Klein, Christoph; Shirts, Michael R.; Pande, Vijay S.

2012-01-01

OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added. PMID:23316124
Visual Computing Environment

NASA Technical Reports Server (NTRS)

Lawrence, Charles; Putt, Charles W.

1997-01-01

The Visual Computing Environment (VCE) is a NASA Lewis Research Center project to develop a framework for intercomponent and multidisciplinary computational simulations. Many current engineering analysis codes simulate various aspects of aircraft engine operation. For example, existing computational fluid dynamics (CFD) codes can model the airflow through individual engine components such as the inlet, compressor, combustor, turbine, or nozzle. Currently, these codes are run in isolation, making intercomponent and complete system simulations very difficult to perform. In addition, management and utilization of these engineering codes for coupled component simulations is a complex, laborious task, requiring substantial experience and effort. To facilitate multicomponent aircraft engine analysis, the CFD Research Corporation (CFDRC) is developing the VCE system. This system, which is part of NASA's Numerical Propulsion Simulation System (NPSS) program, can couple various engineering disciplines, such as CFD, structural analysis, and thermal analysis. The objectives of VCE are to (1) develop a visual computing environment for controlling the execution of individual simulation codes that are running in parallel and are distributed on heterogeneous host machines in a networked environment, (2) develop numerical coupling algorithms for interchanging boundary conditions between codes with arbitrary grid matching and different levels of dimensionality, (3) provide a graphical interface for simulation setup and control, and (4) provide tools for online visualization and plotting. VCE was designed to provide a distributed, object-oriented environment. Mechanisms are provided for creating and manipulating objects, such as grids, boundary conditions, and solution data. This environment includes parallel virtual machine (PVM) for distributed processing. Users can interactively select and couple any set of codes that have been modified to run in a parallel distributed fashion on a cluster of heterogeneous workstations. A scripting facility allows users to dictate the sequence of events that make up the particular simulation.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data

PubMed Central

2011-01-01

Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709
Limits to high-speed simulations of spiking neural networks using general-purpose computers.

PubMed

Zenke, Friedemann; Gerstner, Wulfram

2014-01-01

To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
Massively parallel algorithms for trace-driven cache simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

1991-01-01

Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, Andrew F.; Wetzstein, M.; Naab, T.

2009-10-01

We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shi, Xuanhua; Luo, Xuan; Liang, Junling

GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less
Scalable Metropolis Monte Carlo for simulation of hard shapes

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

2016-07-01

We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
A Comparison of Hybrid Reynolds Averaged Navier Stokes/Large Eddy Simulation (RANS/LES) and Unsteady RANS Predictions of Separated Flow for a Variable Speed Power Turbine Blade Operating with Low Inlet Turbulence Levels

DTIC Science & Technology

2017-10-01

Facility is a large-scale cascade that allows detailed flow field surveys and blade surface measurements.10–12 The facility has a continuous run ...structured grids at 2 flow conditions, cruise and takeoff, of the VSPT blade . Computations were run in parallel on a Department of Defense...RANS/LES) and Unsteady RANS Predictions of Separated Flow for a Variable-Speed Power- Turbine Blade Operating with Low Inlet Turbulence Levels
Position Paper - pFLogger: The Parallel Fortran Logging framework for HPC Applications

NASA Technical Reports Server (NTRS)

Clune, Thomas L.; Cruz, Carlos A.

2017-01-01

In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or logger) similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.
POSITION PAPER - pFLogger: The Parallel Fortran Logging Framework for HPC Applications

NASA Technical Reports Server (NTRS)

Clune, Thomas L.; Cruz, Carlos A.

2017-01-01

In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or 'logger') similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger - a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Multi-Resolution Climate Ensemble Parameter Analysis with Nested Parallel Coordinates Plots.

PubMed

Wang, Junpeng; Liu, Xiaotong; Shen, Han-Wei; Lin, Guang

2017-01-01

Due to the uncertain nature of weather prediction, climate simulations are usually performed multiple times with different spatial resolutions. The outputs of simulations are multi-resolution spatial temporal ensembles. Each simulation run uses a unique set of values for multiple convective parameters. Distinct parameter settings from different simulation runs in different resolutions constitute a multi-resolution high-dimensional parameter space. Understanding the correlation between the different convective parameters, and establishing a connection between the parameter settings and the ensemble outputs are crucial to domain scientists. The multi-resolution high-dimensional parameter space, however, presents a unique challenge to the existing correlation visualization techniques. We present Nested Parallel Coordinates Plot (NPCP), a new type of parallel coordinates plots that enables visualization of intra-resolution and inter-resolution parameter correlations. With flexible user control, NPCP integrates superimposition, juxtaposition and explicit encodings in a single view for comparative data visualization and analysis. We develop an integrated visual analytics system to help domain scientists understand the connection between multi-resolution convective parameters and the large spatial temporal ensembles. Our system presents intricate climate ensembles with a comprehensive overview and on-demand geographic details. We demonstrate NPCP, along with the climate ensemble visualization system, based on real-world use-cases from our collaborators in computational and predictive science.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
Running With an Elastic Lower Limb Exoskeleton.

PubMed

Cherry, Michael S; Kota, Sridhar; Young, Aaron; Ferris, Daniel P

2016-06-01

Although there have been many lower limb robotic exoskeletons that have been tested for human walking, few devices have been tested for assisting running. It is possible that a pseudo-passive elastic exoskeleton could benefit human running without the addition of electrical motors due to the spring-like behavior of the human leg. We developed an elastic lower limb exoskeleton that added stiffness in parallel with the entire lower limb. Six healthy, young subjects ran on a treadmill at 2.3 m/s with and without the exoskeleton. Although the exoskeleton was designed to provide ~50% of normal leg stiffness during running, it only provided 24% of leg stiffness during testing. The difference in added leg stiffness was primarily due to soft tissue compression and harness compliance decreasing exoskeleton displacement during stance. As a result, the exoskeleton only supported about 7% of the peak vertical ground reaction force. There was a significant increase in metabolic cost when running with the exoskeleton compared with running without the exoskeleton (ANOVA, P < .01). We conclude that 2 major roadblocks to designing successful lower limb robotic exoskeletons for human running are human-machine interface compliance and the extra lower limb inertia from the exoskeleton.
Single-Run Single-Mask Inductively-Coupled-Plasma Reactive-Ion-Etching Process for Fabricating Suspended High-Aspect-Ratio Microstructures

NASA Astrophysics Data System (ADS)

Yang, Yao-Joe; Kuo, Wen-Cheng; Fan, Kuang-Chao

2006-01-01

In this work, we present a single-run single-mask (SRM) process for fabricating suspended high-aspect-ratio structures on standard silicon wafers using an inductively coupled plasma-reactive ion etching (ICP-RIE) etcher. This process eliminates extra fabrication steps which are required for structure release after trench etching. Released microstructures with 120 μm thickness are obtained by this process. The corresponding maximum aspect ratio of the trench is 28. The SRM process is an extended version of the standard process proposed by BOSCH GmbH (BOSCH process). The first step of the SRM process is a standard BOSCH process for trench etching, then a polymer layer is deposited on trench sidewalls as a protective layer for the subsequent structure-releasing step. The structure is released by dry isotropic etching after the polymer layer on the trench floor is removed. All the steps can be integrated into a single-run ICP process. Also, only one mask is required. Therefore, the process complexity and fabrication cost can be effectively reduced. Discussions on each SRM step and considerations for avoiding undesired etching of the silicon structures during the release process are also presented.
Scalable Unix tools on parallel processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, W.; Lusk, E.

1994-12-31

The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.
A Queue Simulation Tool for a High Performance Scientific Computing Center

NASA Technical Reports Server (NTRS)

Spear, Carrie; McGalliard, James

2007-01-01

The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

Parallel File System I/O Performance Testing On LANL Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wiens, Isaac Christian; Green, Jennifer Kathleen

2016-08-18

These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.
Parallel Climate Data Assimilation PSAS Package

NASA Technical Reports Server (NTRS)

Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.

1996-01-01

We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.
Anomalous transport in discrete arcs and simulation of double layers in a model auroral circuit

NASA Technical Reports Server (NTRS)

Smith, Robert A.

1987-01-01

The evolution and long-time stability of a double layer (DL) in a discrete auroral arc requires that the parallel current in the arc, which may be considered uniform at the source, be diverted within the arc to charge the flanks of the U-shaped double layer potential structure. A simple model is presented in which this current redistribution is effected by anomalous transport based on electrostatic lower hybrid waves driven by the flank structure itself. This process provides the limiting constraint on the double layer potential. The flank charging may be represented as that of a nonlinear transmission line. A simplified model circuit, in which the transmission line is represented by a nonlinear impedance in parallel with a variable resistor, is incorporated in a one-dimensional simulation model to give the current density at the DL boundaries. Results are presented for the scaling of the DL potential as a function of the width of the arc and the saturation efficiency of the lower hybrid instability mechanism.
Anomalous transport in discrete arcs and simulation of double layers in a model auroral circuit

NASA Technical Reports Server (NTRS)

Smith, Robert A.

1987-01-01

The evolution and long-time stability of a double layer in a discrete auroral arc requires that the parallel current in the arc, which may be considered uniform at the source, be diverted within the arc to charge the flanks of the U-shaped double-layer potential structure. A simple model is presented in which this current re-distribution is effected by anomalous transport based on electrostatic lower hybrid waves driven by the flank structure itself. This process provides the limiting constraint on the double-layer potential. The flank charging may be represented as that of a nonlinear transmission. A simplified model circuit, in which the transmission line is represented by a nonlinear impedance in parallel with a variable resistor, is incorporated in a 1-d simulation model to give the current density at the DL boundaries. Results are presented for the scaling of the DL potential as a function of the width of the arc and the saturation efficiency of the lower hybrid instability mechanism.
Broadband hybrid electromagnetic and piezoelectric energy harvesting from ambient vibrations and pneumatic vortices induced by running subway trains.

DOT National Transportation Integrated Search

2017-05-01

The airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and : trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively : magnetized formation of 6 magnets to...
3. View looking S down West Broad Street sidewalk showing ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

3. View looking S down West Broad Street sidewalk showing S half of Gate in foreground, Wickersham fence running parallel to West Broad St. and Passenger Station in background. - Central of Georgia Railway, Cotton Yard Gates, West Broad Street, Savannah, Chatham County, GA
Enhancing the usability and performance of structured association mapping algorithms using automation, parallelization, and visualization in the GenAMap software system

PubMed Central

2012-01-01

Background Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods. Results To make structured association mapping more accessible to geneticists, we have developed an automatic processing system called Auto-SAM. Auto-SAM enables geneticists to run structured association mapping algorithms automatically, using parallelization. Auto-SAM includes algorithms to discover gene-networks and find population structure. Auto-SAM can also run popular association mapping algorithms, in addition to five structured association mapping algorithms. Conclusions Auto-SAM is available through GenAMap, a front-end desktop visualization tool. GenAMap and Auto-SAM are implemented in JAVA; binaries for GenAMap can be downloaded from http://sailing.cs.cmu.edu/genamap. PMID:22471660
A Component-Based Extension Framework for Large-Scale Parallel Simulations in NEURON

PubMed Central

King, James G.; Hines, Michael; Hill, Sean; Goodman, Philip H.; Markram, Henry; Schürmann, Felix

2008-01-01

As neuronal simulations approach larger scales with increasing levels of detail, the neurosimulator software represents only a part of a chain of tools ranging from setup, simulation, interaction with virtual environments to analysis and visualizations. Previously published approaches to abstracting simulator engines have not received wide-spread acceptance, which in part may be to the fact that they tried to address the challenge of solving the model specification problem. Here, we present an approach that uses a neurosimulator, in this case NEURON, to describe and instantiate the network model in the simulator's native model language but then replaces the main integration loop with its own. Existing parallel network models are easily adopted to run in the presented framework. The presented approach is thus an extension to NEURON but uses a component-based architecture to allow for replaceable spike exchange components and pluggable components for monitoring, analysis, or control that can run in this framework alongside with the simulation. PMID:19430597
Integrated bioassays in microfluidic devices: botulinum toxin assays.

PubMed

Mangru, Shakuntala; Bentz, Bryan L; Davis, Timothy J; Desai, Nitin; Stabile, Paul J; Schmidt, James J; Millard, Charles B; Bavari, Sina; Kodukula, Krishna

2005-12-01

A microfluidic assay was developed for screening botulinum neurotoxin serotype A (BoNT-A) by using a fluorescent resonance energy transfer (FRET) assay. Molded silicone microdevices with integral valves, pumps, and reagent reservoirs were designed and fabricated. Electrical and pneumatic control hardware were constructed, and software was written to automate the assay protocol and data acquisition. Detection was accomplished by fluorescence microscopy. The system was validated with a peptide inhibitor, running 2 parallel assays, as a feasibility demonstration. The small footprint of each bioreactor cell (0.5 cm2) and scalable fluidic architecture enabled many parallel assays on a single chip. The chip is programmable to run a dilution series in each lane, generating concentration-response data for multiple inhibitors. The assay results showed good agreement with the corresponding experiments done at a macroscale level. Although the system has been developed for BoNT-A screening, a wide variety of assays can be performed on the microfluidic chip with little or no modification.
Predicting the stability of a compressible periodic parallel jet flow

NASA Technical Reports Server (NTRS)

Miles, Jeffrey H.

1996-01-01

It is known that mixing enhancement in compressible free shear layer flows with high convective Mach numbers is difficult. One design strategy to get around this is to use multiple nozzles. Extrapolating this design concept in a one dimensional manner, one arrives at an array of parallel rectangular nozzles where the smaller dimension is omega and the longer dimension, b, is taken to be infinite. In this paper, the feasibility of predicting the stability of this type of compressible periodic parallel jet flow is discussed. The problem is treated using Floquet-Bloch theory. Numerical solutions to this eigenvalue problem are presented. For the case presented, the interjet spacing, s, was selected so that s/omega =2.23. Typical plots of the eigenvalue and stability curves are presented. Results obtained for a range of convective Mach numbers from 3 to 5 show growth rates omega(sub i)=kc(sub i)/2 range from 0.25 to 0.29. These results indicate that coherent two-dimensional structures can occur without difficulty in multiple parallel periodic jet nozzles and that shear layer mixing should occur with this type of nozzle design.
Functional cartilage MRI T2 mapping: evaluating the effect of age and training on knee cartilage response to running.

PubMed

Mosher, T J; Liu, Y; Torok, C M

2010-03-01

To characterize effects of age and physical activity level on cartilage thickness and T2 response immediately after running. Institutional review board approval was obtained and all subjects provided informed consent prior to study participation. Cartilage thickness and magnetic resonance imaging (MRI) T2 values of 22 marathon runners and 15 sedentary controls were compared before and after 30 min of running. Runner and control groups were stratified by ageor=46 years. Multi-echo [(Time to Repetition (TR)/Time to Echo (TE) 1500 ms/9-109 ms)] MR images obtained using a 3.0 T scanner were used to calculate thickness and T2 values from the central femoral and tibial cartilage. Baseline cartilage T2 values, and change in cartilage thickness and T2 values after running were compared between the four groups using one-way analysis of variance (ANOVA). After running MRI T2 values decreased in superficial femoral (2 ms-4 ms) and tibial (1 ms-3 ms) cartilage along with a decrease in cartilage thickness: (femoral: 4%-8%, tibial: 0%-12%). Smaller decrease in cartilage T2 values were observed in the middle zone of cartilage, and no change was observed in the deepest layer. There was no difference cartilage deformation or T2 response to running as a function of age or level of physical activity. Running results in a measurable decrease in cartilage thickness and MRI T2 values of superficial cartilage consistent with greater compressibility of the superficial cartilage layer. Age and level of physical activity did not alter the T2 response to running. Copyright 2009 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Parallelization and checkpointing of GPU applications through program transformation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solano-Quinde, Lizandro Damian

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
The emergence of asymmetric normal fault systems under symmetric boundary conditions

NASA Astrophysics Data System (ADS)

Schöpfer, Martin P. J.; Childs, Conrad; Manzocchi, Tom; Walsh, John J.; Nicol, Andrew; Grasemann, Bernhard

2017-11-01

Many normal fault systems and, on a smaller scale, fracture boudinage often exhibit asymmetry with one fault dip direction dominating. It is a common belief that the formation of domino and shear band boudinage with a monoclinic symmetry requires a component of layer parallel shearing. Moreover, domains of parallel faults are frequently used to infer the presence of a décollement. Using Distinct Element Method (DEM) modelling we show, that asymmetric fault systems can emerge under symmetric boundary conditions. A statistical analysis of DEM models suggests that the fault dip directions and system polarities can be explained using a random process if the strength contrast between the brittle layer and the surrounding material is high. The models indicate that domino and shear band boudinage are unreliable shear-sense indicators. Moreover, the presence of a décollement should not be inferred on the basis of a domain of parallel faults alone.
Crystallographic structure and superconductive properties of Nb-Ti films with an artificially layered structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sato, N.

1990-06-15

Artificially layered niobium-titanium (Nb-Ti) films with various thickness ratios (3/1--1/3) and periodicities (2--100 A) are made in an argon or in a mixed argon/nitrogen atmosphere by a dc magnetron sputtering method. Films with small periodicities (less than 30 A) have an artificial superlattice structure (ASL) with crystallographic coherence between constituent layers, where Nb and Ti grow epitaxially on the closest planes. The crystallographic structures of films are bcc with the (110) plane parallel to the film for films with the same or a thicker Nb layer than a Ti layer, and hcp with the (001) plane parallel to the filmmore » for films with a thinner Nb layer than a Ti layer. Films with large periodicities have an artificial superstructure (ASS) with only periodic stacking of constituent layers. Films deposited in the Ar/N atmosphere also have the artificially layered structures of ASL or ASS. The artificially layered structure is thermally stable at temperatures up to 500 {degree}C. The superconducting properties of the films depend strongly on the periodicity and thickness ratio of Nb and Ti layers. The dependence of the transition temperature on the periodicity and thickness ratio is qualitatively explained by a proximity effect with a three-region model. Films with periodicities less than 20 A, composed of the same or a thicker Nb layer than a Ti layer, show high transition temperatures (above 9.3 K). The highest {ital T}{sub {ital c}} of about 13.6 K is obtained in the film composed of monatomic layers of constituents deposited in an Ar atmosphere including 30 vol % N.« less
Use of parallel computing for analyzing big data in EEG studies of ambiguous perception

NASA Astrophysics Data System (ADS)

Maksimenko, Vladimir A.; Grubov, Vadim V.; Kirsanov, Daniil V.

2018-02-01

Problem of interaction between human and machine systems through the neuro-interfaces (or brain-computer interfaces) is an urgent task which requires analysis of large amount of neurophysiological EEG data. In present paper we consider the methods of parallel computing as one of the most powerful tools for processing experimental data in real-time with respect to multichannel structure of EEG. In this context we demonstrate the application of parallel computing for the estimation of the spectral properties of multichannel EEG signals, associated with the visual perception. Using CUDA C library we run wavelet-based algorithm on GPUs and show possibility for detection of specific patterns in multichannel set of EEG data in real-time.
MHD Code Optimizations and Jets in Dense Gaseous Halos

NASA Astrophysics Data System (ADS)

Gaibler, Volker; Vigelius, Matthias; Krause, Martin; Camenzind, Max

We have further optimized and extended the 3D-MHD-code NIRVANA. The magnetized part runs in parallel, reaching 19 Gflops per SX-6 node, and has a passively advected particle population. In addition, the code is MPI-parallel now - on top of the shared memory parallelization. On a 512^3 grid, we reach 561 Gflops with 32 nodes on the SX-8. Also, we have successfully used FLASH on the Opteron cluster. Scientific results are preliminary so far. We report one computation of highly resolved cocoon turbulence. While we find some similarities to earlier 2D work by us and others, we note a strange reluctancy of cold material to enter the low density cocoon, which has to be investigated further.
Methods and results of boundary layer measurements on a glider

NASA Technical Reports Server (NTRS)

Nes, W. V.

1978-01-01

Boundary layer measurements were carried out on a glider under natural conditions. Two effects are investigated: the effect of inconstancy of the development of static pressure within the boundary layer and the effect of the negative pressure difference in a sublaminar boundary layer. The results obtained by means of an ion probe in parallel connection confirm those results obtained by means of a pressure probe. Additional effects which have occurred during these measurements are briefly dealt with.
Spin-valve Josephson junctions for cryogenic memory

NASA Astrophysics Data System (ADS)

Niedzielski, Bethany M.; Bertus, T. J.; Glick, Joseph A.; Loloee, R.; Pratt, W. P.; Birge, Norman O.

2018-01-01

Josephson junctions containing two ferromagnetic layers are being considered for use in cryogenic memory. Our group recently demonstrated that the ground-state phase difference across such a junction with carefully chosen layer thicknesses could be controllably toggled between zero and π by switching the relative magnetization directions of the two layers between the antiparallel and parallel configurations. However, several technological issues must be addressed before those junctions can be used in a large-scale memory. Many of these issues can be more easily studied in single junctions, rather than in the superconducting quantum interference device (SQUID) used for phase-sensitive measurements. In this work, we report a comprehensive study of spin-valve junctions containing a Ni layer with a fixed thickness of 2.0 nm and a NiFe layer of thickness varying between 1.1 and 1.8 nm in steps of 0.1 nm. We extract the field shift of the Fraunhofer patterns and the critical currents of the junctions in the parallel and antiparallel magnetic states, as well as the switching fields of both magnetic layers. We also report a partial study of similar junctions containing a slightly thinner Ni layer of 1.6 nm and the same range of NiFe thicknesses. These results represent the first step toward mapping out a "phase diagram" for phase-controllable spin-valve Josephson junctions as a function of the two magnetic layer thicknesses.
A tool for simulating parallel branch-and-bound methods

NASA Astrophysics Data System (ADS)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Dynamic file-access characteristics of a production parallel scientific workload

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.

A parallel computational model for GATE simulations.

PubMed

Rannou, F R; Vega-Acevedo, N; El Bitar, Z

2013-12-01

GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

PubMed

Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

2016-10-01

Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

PubMed Central

Sittig, D. F.; Orr, J. A.

1991-01-01

Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607
Second Evaluation of Job Queuing/Scheduling Software. Phase 1

NASA Technical Reports Server (NTRS)

Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)

1997-01-01

The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

PubMed Central

Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

2014-01-01

Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397
Design, Results, Evolution and Status of the ATLAS Simulation at Point1 Project

NASA Astrophysics Data System (ADS)

Ballestrero, S.; Batraneanu, S. M.; Brasolin, F.; Contescu, C.; Fazio, D.; Di Girolamo, A.; Lee, C. J.; Pozo Astigarraga, M. E.; Scannicchio, D. A.; Sedov, A.; Twomey, M. S.; Wang, F.; Zaytsev, A.

2015-12-01

During the LHC Long Shutdown 1 (LSI) period, that started in 2013, the Simulation at Point1 (Sim@P1) project takes advantage, in an opportunistic way, of the TDAQ (Trigger and Data Acquisition) HLT (High-Level Trigger) farm of the ATLAS experiment. This farm provides more than 1300 compute nodes, which are particularly suited for running event generation and Monte Carlo production jobs that are mostly CPU and not I/O bound. It is capable of running up to 2700 Virtual Machines (VMs) each with 8 CPU cores, for a total of up to 22000 parallel jobs. This contribution gives a review of the design, the results, and the evolution of the Sim@P1 project, operating a large scale OpenStack based virtualized platform deployed on top of the ATLAS TDAQ HLT farm computing resources. During LS1, Sim@P1 was one of the most productive ATLAS sites: it delivered more than 33 million CPU-hours and it generated more than 1.1 billion Monte Carlo events. The design aspects are presented: the virtualization platform exploited by Sim@P1 avoids interferences with TDAQ operations and it guarantees the security and the usability of the ATLAS private network. The cloud mechanism allows the separation of the needed support on both infrastructural (hardware, virtualization layer) and logical (Grid site support) levels. This paper focuses on the operational aspects of such a large system during the upcoming LHC Run 2 period: simple, reliable, and efficient tools are needed to quickly switch from Sim@P1 to TDAQ mode and back, to exploit the resources when they are not used for the data acquisition, even for short periods. The evolution of the central OpenStack infrastructure is described, as it was upgraded from Folsom to the Icehouse release, including the scalability issues addressed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Agosta, C. C.; Jin, J.; Coniglio, W. A.

We present upper critical field data for {kappa}-(BEDT-TTF){sub 2}Cu(NCS){sub 2} with the magnetic field close to parallel and parallel to the conducting layers. We show that we can eliminate the effect of vortex dynamics in these layered materials if the layers are oriented within 0.3-inch of parallel to the applied magnetic field. Eliminating vortex effects leaves one remaining feature in the data that corresponds to the Pauli paramagnetic limit (H{sub p}). We propose a semiempirical method to calculate the H{sub p} in quasi-2D superconductors. This method takes into account the energy gap of each of the quasi-2D superconductors, which ismore » calculated from specific-heat data, and the influence of many-body effects. The calculated Pauli paramagnetic limits are then compared to critical field data for the title compound and other organic conductors. Many of the examined quasi-2D superconductors, including the above organic superconductors and CeCoIn{sub 5}, exhibit upper critical fields that exceed their calculated H{sub p} suggesting unconventional superconductivity. We show that the high-field low-temperature state in {kappa}-(BEDT-TTF){sub 2}Cu(NCS){sub 2} is consistent with the Fulde-Ferrell-Larkin-Ovchinnikov state.« less
Soft lubrication: The elastohydrodynamics of nonconforming and conforming contacts

NASA Astrophysics Data System (ADS)

Skotheim, J. M.; Mahadevan, L.

2005-09-01

We study the lubrication of fluid-immersed soft interfaces and show that elastic deformation couples tangential and normal forces and thus generates lift. We consider materials that deform easily, due to either geometry (e.g., a shell) or constitutive properties (e.g., a gel or a rubber), so that the effects of pressure and temperature on the fluid properties may be neglected. Four different system geometries are considered: a rigid cylinder moving parallel to a soft layer coating a rigid substrate; a soft cylinder moving parallel to a rigid substrate; a cylindrical shell moving parallel to a rigid substrate; and finally a cylindrical conforming journal bearing coated with a thin soft layer. In addition, for the particular case of a soft layer coating a rigid substrate, we consider both elastic and poroelastic material responses. For all these cases, we find the same generic behavior: there is an optimal combination of geometric and material parameters that maximizes the dimensionless normal force as a function of the softness parameter η =hydrodynamicpressure/elasticstiffness=surfacedeflection/gapthickness, which characterizes the fluid-induced deformation of the interface. The corresponding cases for a spherical slider are treated using scaling concepts.
Simulation of Powder Layer Deposition in Additive Manufacturing Processes Using the Discrete Element Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herbold, E. B.; Walton, O.; Homel, M. A.

2015-10-26

This document serves as a final report to a small effort where several improvements were added to a LLNL code GEODYN-L to develop Discrete Element Method (DEM) algorithms coupled to Lagrangian Finite Element (FE) solvers to investigate powder-bed formation problems for additive manufacturing. The results from these simulations will be assessed for inclusion as the initial conditions for Direct Metal Laser Sintering (DMLS) simulations performed with ALE3D. The algorithms were written and performed on parallel computing platforms at LLNL. The total funding level was 3-4 weeks of an FTE split amongst two staff scientists and one post-doc. The DEM simulationsmore » emulated, as much as was feasible, the physical process of depositing a new layer of powder over a bed of existing powder. The DEM simulations utilized truncated size distributions spanning realistic size ranges with a size distribution profile consistent with realistic sample set. A minimum simulation sample size on the order of 40-particles square by 10-particles deep was utilized in these scoping studies in order to evaluate the potential effects of size segregation variation with distance displaced in front of a screed blade. A reasonable method for evaluating the problem was developed and validated. Several simulations were performed to show the viability of the approach. Future investigations will focus on running various simulations investigating powder particle sizing and screen geometries.« less
Parallel electric fields in extragalactic jets - Double layers and anomalous resistivity in symbiotic relationships

NASA Technical Reports Server (NTRS)

Borovsky, J. E.

1986-01-01

After examining the properties of Coulomb-collision resistivity, anomalous (collective) resistivity, and double layers, a hybrid anomalous-resistivity/double-layer model is introduced. In this model, beam-driven waves on both sides of a double layer provide electrostatic plasma-wave turbulence that greatly reduces the mobility of charged particles. These regions then act to hold open a density cavity within which the double layer resides. In the double layer, electrical energy is dissipated with 100 percent efficiency into high-energy particles, creating conditions optimal for the collective emission of polarized radio waves.
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Performance evaluation of parallel electric field tunnel field-effect transistor by a distributed-element circuit model

NASA Astrophysics Data System (ADS)

Morita, Yukinori; Mori, Takahiro; Migita, Shinji; Mizubayashi, Wataru; Tanabe, Akihito; Fukuda, Koichi; Matsukawa, Takashi; Endo, Kazuhiko; O'uchi, Shin-ichi; Liu, Yongxun; Masahara, Meishoku; Ota, Hiroyuki

2014-12-01

The performance of parallel electric field tunnel field-effect transistors (TFETs), in which band-to-band tunneling (BTBT) was initiated in-line to the gate electric field was evaluated. The TFET was fabricated by inserting an epitaxially-grown parallel-plate tunnel capacitor between heavily doped source wells and gate insulators. Analysis using a distributed-element circuit model indicated there should be a limit of the drain current caused by the self-voltage-drop effect in the ultrathin channel layer.
Recent advances in PDF modeling of turbulent reacting flows

NASA Technical Reports Server (NTRS)

Leonard, Andrew D.; Dai, F.

1995-01-01

This viewgraph presentation concludes that a Monte Carlo probability density function (PDF) solution successfully couples with an existing finite volume code; PDF solution method applied to turbulent reacting flows shows good agreement with data; and PDF methods must be run on parallel machines for practical use.
1H-Indole-3-carbaldehyde

PubMed Central

Dileep, C. S.; Abdoh, M. M. M.; Chakravarthy, M. P.; Mohana, K. N.; Sridhar, M. A.

2012-01-01

In the title compound, C9H7NO, the benzene ring forms a dihedral angle of 3.98 (12)° with the pyrrole ring. In the crystal, N–H⋯O hydrogen bonds links the molecules into chains which run parallel to [02-1]. PMID:23284457
Parallel noise barrier prediction procedure : report 2 user's manual revision 1

DOT National Transportation Integrated Search

1987-11-01

This report defines the parameters which are used to input the data required to run Program Barrier and BarrierX on a microcomputer such as an IBM PC or compatible. Directions for setting up and operating a working disk are presented. Examples of inp...
Forces and mechanical energy fluctuations during diagonal stride roller skiing; running on wheels?

PubMed

Kehler, Alyse L; Hajkova, Eliska; Holmberg, Hans-Christer; Kram, Rodger

2014-11-01

Mechanical energy can be conserved during terrestrial locomotion in two ways: the inverted pendulum mechanism for walking and the spring-mass mechanism for running. Here, we investigated whether diagonal stride cross-country roller skiing (DIA) utilizes similar mechanisms. Based on previous studies, we hypothesized that running and DIA would share similar phase relationships and magnitudes of kinetic energy (KE), and gravitational potential energy (GPE) fluctuations, indicating elastic energy storage and return, as if roller skiing is like 'running on wheels'. Experienced skiers (N=9) walked and ran at 1.25 and 3 m s(-1), respectively, and roller skied with DIA at both speeds on a level dual-belt treadmill that recorded perpendicular and parallel forces. We calculated the KE and GPE of the center of mass from the force recordings. As expected, the KE and GPE fluctuated with an out-of-phase pattern during walking and an in-phase pattern during running. Unlike walking, during DIA, the KE and GPE fluctuations were in phase, as they are in running. However, during the glide phase, KE was dissipated as frictional heat and could not be stored elastically in the tendons, as in running. Elastic energy storage and return epitomize running and thus we reject our hypothesis. Diagonal stride cross-country skiing is a biomechanically unique movement that only superficially resembles walking or running. © 2014. Published by The Company of Biologists Ltd.

Diffractive Hyperbola of a Skin Layer

NASA Astrophysics Data System (ADS)

Yakubov, V. P.; Vaiman, E. V.; Shipilov, S. È.; Prasath, A. K.

2018-03-01

Based on an analysis of physics of the phase transition from the quasistatic state field to the running wave field of elementary electric and magnetic dipoles located in absorbing media, it is concluded that the skin layer is formed at the boundary of this phase transition. The possibility is considered of obtaining the diffractive hyperbola of the skin layer and its subsequent application for sensing of objects in strongly absorbing media.
Hybrid composite laminates reinforced with Kevlar/carbon/glass woven fabrics for ballistic impact testing.

PubMed

Randjbaran, Elias; Zahari, Rizal; Jalil, Nawal Aswan Abdul; Majid, Dayang Laila Abang Abdul

2014-01-01

Current study reported a facile method to investigate the effects of stacking sequence layers of hybrid composite materials on ballistic energy absorption by running the ballistic test at the high velocity ballistic impact conditions. The velocity and absorbed energy were accordingly calculated as well. The specimens were fabricated from Kevlar, carbon, and glass woven fabrics and resin and were experimentally investigated under impact conditions. All the specimens possessed equal mass, shape, and density; nevertheless, the layers were ordered in different stacking sequence. After running the ballistic test at the same conditions, the final velocities of the cylindrical AISI 4340 Steel pellet showed how much energy was absorbed by the samples. The energy absorption of each sample through the ballistic impact was calculated; accordingly, the proper ballistic impact resistance materials could be found by conducting the test. This paper can be further studied in order to characterise the material properties for the different layers.
RTS2: a powerful robotic observatory manager

NASA Astrophysics Data System (ADS)

Kubánek, Petr; Jelínek, Martin; Vítek, Stanislav; de Ugarte Postigo, Antonio; Nekola, Martin; French, John

2006-06-01

RTS2, or Remote Telescope System, 2nd Version, is an integrated package for remote telescope control under the Linux operating system. It is designed to run in fully autonomous mode, picking targets from a database table, storing image meta data to the database, processing images and storing their WCS coordinates in the database and offering Virtual-Observatory enabled access to them. It is currently running on various telescope setups world-wide. For control of devices from various manufacturers we developed an abstract device layer, enabling control of all possible combinations of mounts, CCDs, photometers, roof and cupola controllers. We describe the evolution of RTS2 from Python-based RTS to C and later C++ based RTS2, focusing on the problems we faced during development. The internal structure of RTS2, focusing on object layering, which is used to uniformly control various devices and provides uniform reporting layer, is also discussed.
cellGPU: Massively parallel simulations of dynamic vertex models

NASA Astrophysics Data System (ADS)

Sussman, Daniel M.

2017-10-01

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Kwan-Liu

Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations willmore » prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.« less
Interacting tilt and kink instabilities in repelling current channels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keppens, R.; Porth, O.; Xia, C., E-mail: rony.keppens@wis.kuleuven.be

2014-11-01

We present a numerical study in resistive magnetohydrodynamics (MHD) where the initial equilibrium configuration contains adjacent, oppositely directed, parallel current channels. Since oppositely directed current channels repel, the equilibrium is liable to an ideal magnetohydrodynamic tilt instability. This tilt evolution, previously studied in planar settings, involves two magnetic islands or flux ropes, which on Alfvénic timescales undergo a combined rotation and separation. This in turn leads to the creation of (near) singular current layers, posing severe challenges to numerical approaches. Using our open-source grid-adaptive MPI-AMRVAC software, we revisit the planar evolution case in compressible MHD, as well as its extensionmore » to two-and-a-half-dimensional (2.5D) and full three-dimensional (3D) scenarios. As long as the third dimension can be ignored, pure tilt evolutions result that are hardly affected by out of plane magnetic field components. In all 2.5D runs, our simulations do show secondary tearing type disruptions throughout the near singular current sheets in the far nonlinear saturation regime. In full 3D runs, both current channels can be liable to additional ideal kink deformations. We discuss the effects of having both tilt and kink instabilities acting simultaneously in the violent, reconnection-dominated evolution. In 3D, both the tilt and the kink instabilities can be stabilized by tension forces. As a concrete space plasma application, we argue that interacting tilt-kink instabilities in repelling current channels provide a novel route to initiate solar coronal mass ejections, distinctly different from the currently favored pure kink or torus instability routes.« less
Origin of Lamellar Magnetism (Invited)

NASA Astrophysics Data System (ADS)

McEnroe, S. A.; Robinson, P.; Fabian, K.; Harrison, R. J.

2010-12-01

The theory of lamellar magnetism arose through search for the origin of the strong and extremely stable remanent magnetization (MDF>100 mT) recorded in igneous and metamorphic rocks containing ilmenite with exsolution lamellae of hematite, or hematite with exsolution lamellae of ilmenite. Properties of rocks producing major remanent magnetic anomalies could not be explained by PM ilmenite or CAF hematite alone. Monte Carlo modeling of chemical and magnetic interactions in such intergrowths at high temperature indicated the presence of "contact layers" one cation layer thick at (001) interfaces of the two phases. Contact layers, with chemical composition different from layers in the adjacent phases, provide partial relief of ionic charge imbalance at interfaces, and can be common, not only in magnetic minerals. In rhombohedral Fe-Ti oxides, magnetic moments of 2 Fe2+Fe3+ contact layers (2 x 4.5µB) on both sides of a lamella, are balanced by the unbalanced magnetic moment of 1 Fe3+ hematite layer (1 x 5µB), to produce a net uncompensated ferrimagnetic "lamellar moment" of 4µB. Bulk lamellar moment is not proportional to the amount of magnetic oxide, but to the quantity of magnetically "in-phase" lamellar interfaces, with greater abundance and smaller thickness of lamellae, extending down to 1-2 nm. The proportion of "magnetically in-phase" lamellae relates to the orientation of (001) interfaces to the magnetizing field during exsolution, hence highest in samples with a strong lattice-preferred orientation of (001) parallel to the field during exsolution. The nature of contact layers, ~0.23 nm thick, with Fe2+Fe3+ charge ordering postulated by the Monte Carlo models, was confirmed by bond-valence and DFT calculations, and, their presence confirmed by Mössbauer measurements. Hysteresis experiments on hematite with nanoscale ilmenite at temperatures below 57 K, where ilmenite becomes AF, demonstrate magnetic exchange bias produced by strong coupling across phase interfaces. Interface coupling, with nominal magnetic moments perpendicular and parallel to (001), is facilitated by magnetic moments in hematite near interfaces that are a few degrees out of the (001) plane, proved by neutron diffraction experiments. When a ~b.y.-old sample, with a highly stable NRM, is ZF cooled below 57 K, it shows bimodal exchange bias, indicating the presence of two lamellar populations that are magnetically "out-of-phase", and incidentally proving the existence of lamellar magnetism. Lamellar magnetism may enhance the strength and stability of remanence in samples with magnetite or maghemite lamellae in pure hematite, or magnetite lamellae in ilmenite, where coarse magnetite or maghemite alone would be multi-domain. Here the "contact layers" should be a complex hybrid of 2/3-filled rhombohedral layers parallel to (001) and 3/4-filled cubic octahedral layers parallel to (111), with a common octahedral orientation confirmed by TEM observations. Here, because of different layer populations, the calculated lamellar moment may be higher than in the purely rhombohedral example.
Wheel-type magnetic refrigerator

DOEpatents

Barclay, John A.

1983-01-01

The disclosure is directed to a wheel-type magnetic refrigerator capable of cooling over a large temperature range. Ferromagnetic or paramagnetic porous materials are layered circumferentially according to their Curie temperature. The innermost layer has the lowest Curie temperature and the outermost layer has the highest Curie temperature. The wheel is rotated through a magnetic field perpendicular to the axis of the wheel and parallel to its direction of rotation. A fluid is pumped through portions of the layers using inner and outer manifolds to achieve refrigeration of a thermal load.
Wheel-type magnetic refrigerator

DOEpatents

Barclay, J.A.

1982-01-20

The disclosure is directed to a wheel-type magnetic refrigerator capable of cooling over a large temperature range. Ferromagnetic or paramagnetic porous materials are layered circumferentially according to their Curie temperature. The innermost layer has the lowest Curie temperature and the outermost layer has the highest Curie temperature. The wheel is rotated through a magnetic field perpendicular to the axis of the wheel and parallel to its direction of rotation. A fluid is pumped through portions of the layers using inner and outer manifolds to achieve refrigeration of a thermal load.
FairMQ for Online Reconstruction - An example on \\overline{{\\rm{P}}}ANDA test beam data

NASA Astrophysics Data System (ADS)

Stockmanns, Tobias; PANDA Collaboration

2017-10-01

One of the large challenges of future particle physics experiments is the trend to run without a first level hardware trigger. The typical data rates exceed easily hundreds of GBytes/s, which is way too much to be stored permanently for an offline analysis. Therefore a strong data reduction has to be done by selection of only those data, which are physically interesting. This implies that all detector data are read out and have to be processed with the same rate as it is produced. Several different hardware approaches from FPGAs, GPUs to multicore CPUs and mixtures of these systems are under study. Common to all of them is the need to process the data in massive parallel systems. One very convenient way to realize parallel systems on heterogeneous systems is the usage of message queue based multiprocessing. One package that allow development of such application is the FairMQ module in the FairRoot simulation framework developed at GSI. FairRoot is used by several different experiments at and outside the GSI including the \\overline{{{P}}}ANDA experiment. FairMQ is an abstract layer for message queue base application, it has up to now two implementations: ZeroMQ and nanomsg. For the \\overline{{{P}}}ANDA experiment, FairMQ is under test in two different ways. On the one hand side for online processing test beam data of prototypes of sub-detectors of \\overline{{{P}}}ANDA and, in a more generalized way, on time-based simulated data of the complete detector system. The first test on test beam data is presented in this paper.
A Classification of Subaqueous Density Flows Based on Transformations From Proximal to Distal Regions

NASA Astrophysics Data System (ADS)

Hermidas, Navid; Eggenhuisen, Joris; Luthi, Stefan; Silva Jacinto, Ricardo; Toth, Ferenc; Pohl, Florian

2017-04-01

Transformations of a subaqueous density flow from proximal to distal regions are investigated. A classification of these transformations based on the state of the free shear and boundary layers and existence of a plug layer during transition from a debris flow to a turbidity current is presented. A connection between the emplaced deposit by the flow and the relevant flow type is drawn through the results obtained from a series of laboratory flume experiments. These were performed using 9%, 15%, and 21% sediment mixture concentrations composed of sand, silt, clay, and tap water, on varying bed slopes of 6°, 8°, and 9.5°, and with discharge rates of 10[m3/h] and 15[m3/h]. Stress-controlled rheometry experiments were performed on the mixtures to obtain apparent viscosity data. A classification was developed based on the imposed flow conditions, where a cohesive flow may fall within one of five distinct flow types: 1) a cohesive plug flow (PF) with a laminar free shear and boundary layers, 2) a top transitional plug flow (TTPF) containing a turbulent free shear layer, a plug layer, and a laminar boundary layer, 3) a complete transitional plug flow (CTPF) consisting of a turbulent free shear and boundary layers and a plug, 4) a transitional turbidity current (TTC) with a turbulent free shear layer and a laminar boundary layer, and, 5) a completely turbulent turbidity current (TC). During the experiments, flow type PF resulted in en masse deposition of a thick uniform ungraded muddy sand mixture, which was emplaced once the yield stress overcame the gravitational forces within the tail region of the flow. Flow type TTPF resulted in deposition of a thin ungraded basal clean sand layer during the run. This layer was covered by a muddy sand deposit from the tail. Flow type TTC did not deposit any sediment during the run. A uniform muddy sand mixture was emplaced by the tail of the flow. Flow type TC resulted in deposition of poorly sorted massive bottom sand layer. This layer was overlain by either a muddy sand mixture or a sand and silt planar lamination. Flow type CTPF was not observed during the experiments. Furthermore, it was observed that flows which are in transition from a TTC to a TTPF result in a thin bottom clean sand layer covered by a banded transitional interval. This was overlain by a muddy sand layer and a very thin clean sand layer, resulting from traction by dilute turbulent wake. In all cases a mud cap was emplaced on top of the deposit after the runs were terminated.
Reaction rates of graphite with ozone measured by etch decoration

NASA Technical Reports Server (NTRS)

Hennig, G. R.; Montet, G. L.

1968-01-01

Etch-decoration technique of detecting vacancies in graphite has been used to determine the reaction rates of graphite with ozone in the directions parallel and perpendicular to the layer planes. It consists essentially of peeling single atom layers off graphite crystals without affecting the remainder of the crystal.
Sex-related differences in the wheel-running activity of mice decline with increasing age.

PubMed

Bartling, Babett; Al-Robaiy, Samiya; Lehnich, Holger; Binder, Leonore; Hiebl, Bernhard; Simm, Andreas

2017-01-01

Laboratory mice of both sexes having free access to running wheels are commonly used to study mechanisms underlying the beneficial effects of physical exercise on health and aging in human. However, comparative wheel-running activity profiles of male and female mice for a long period of time in which increasing age plays an additional role are unknown. Therefore, we permanently recorded the wheel-running activity (i.e., total distance, median velocity, time of breaks) of female and male mice until 9months of age. Our records indicated higher wheel-running distances for females than males which were highest in 2-month-old mice. This was mainly reached by higher running velocities of the females and not by longer running times. However, the sex-related differences declined in parallel to the age-associated reduction in wheel-running activities. Female mice also showed more variances between the weekly running distances than males, which were recorded most often for females being 4-6months old but not older. Additional records of 24-month-old mice of both sexes indicated highly reduced wheel-running activities at old age. Surprisingly, this reduction at old age resulted mainly from lower running velocities and not from shorter running times. Old mice also differed in their course of night activity which peaked later compared to younger mice. In summary, we demonstrated the influence of sex on the age-dependent activity profile of mice which is somewhat contrasting to humans, and this has to be considered when transferring exercise-mediated mechanism from mouse to human. Copyright © 2016. Published by Elsevier Inc.
CMUTs with high-K atomic layer deposition dielectric material insulation layer.

PubMed

Xu, Toby; Tekes, Coskun; Degertekin, F

2014-12-01

Use of high-κ dielectric, atomic layer deposition (ALD) materials as an insulation layer material for capacitive micromachined ultrasonic transducers (CMUTs) is investigated. The effect of insulation layer material and thickness on CMUT performance is evaluated using a simple parallel plate model. The model shows that both high dielectric constant and the electrical breakdown strength are important for the dielectric material, and significant performance improvement can be achieved, especially as the vacuum gap thickness is reduced. In particular, ALD hafnium oxide (HfO2) is evaluated and used as an improvement over plasma-enhanced chemical vapor deposition (PECVD) silicon nitride (Six)Ny)) for CMUTs fabricated by a low-temperature, complementary metal oxide semiconductor transistor-compatible, sacrificial release method. Relevant properties of ALD HfO2) such as dielectric constant and breakdown strength are characterized to further guide CMUT design. Experiments are performed on parallel fabricated test CMUTs with 50-nm gap and 16.5-MHz center frequency to measure and compare pressure output and receive sensitivity for 200-nm PECVD Six)Ny) and 100-nm HfO2) insulation layers. Results for this particular design show a 6-dB improvement in receiver output with the collapse voltage reduced by one-half; while in transmit mode, half the input voltage is needed to achieve the same maximum output pressure.
Linear and nonlinear stability of the Blasius boundary layer

NASA Technical Reports Server (NTRS)

Bertolotti, F. P.; Herbert, TH.; Spalart, P. R.

1992-01-01

Two new techniques for the study of the linear and nonlinear instability in growing boundary layers are presented. The first technique employs partial differential equations of parabolic type exploiting the slow change of the mean flow, disturbance velocity profiles, wavelengths, and growth rates in the streamwise direction. The second technique solves the Navier-Stokes equation for spatially evolving disturbances using buffer zones adjacent to the inflow and outflow boundaries. Results of both techniques are in excellent agreement. The linear and nonlinear development of Tollmien-Schlichting (TS) waves in the Blasius boundary layer is investigated with both techniques and with a local procedure based on a system of ordinary differential equations. The results are compared with previous work and the effects of non-parallelism and nonlinearity are clarified. The effect of nonparallelism is confirmed to be weak and, consequently, not responsible for the discrepancies between measurements and theoretical results for parallel flow.
HADY-I, a FORTRAN program for the compressible stability analysis of three-dimensional boundary layers. [on swept and tapered wings

NASA Technical Reports Server (NTRS)

El-Hady, N. M.

1981-01-01

A computer program HADY-I for calculating the linear incompressible or compressible stability characteristics of the laminar boundary layer on swept and tapered wings is described. The eigenvalue problem and its adjoint arising from the linearized disturbance equations with the appropriate boundary conditions are solved numerically using a combination of Newton-Raphson interative scheme and a variable step size integrator based on the Runge-Kutta-Fehlburh fifth-order formulas. The integrator is used in conjunction with a modified Gram-Schmidt orthonormalization procedure. The computer program HADY-I calculates the growth rates of crossflow or streamwise Tollmien-Schlichting instabilities. It also calculates the group velocities of these disturbances. It is restricted to parallel stability calculations, where the boundary layer (meanflow) is assumed to be parallel. The meanflow solution is an input to the program.
rfpipe: Radio interferometric transient search pipeline

NASA Astrophysics Data System (ADS)

Law, Casey J.

2017-10-01

rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.
A Configuration Framework and Implementation for the Least Privilege Separation Kernel

DTIC Science & Technology

2010-12-01

The Altova Web site states that virtualization software, Parallels for Mac and Wine , is required for running it on MacOS and RedHat Linux...University of Singapore Singapore 28. Tan Lai Poh National University of Singapore Singapore 29. Quek Chee Luan Defence Science & Technology Agency Singapore
Parallel Algorithm Solves Coupled Differential Equations

NASA Technical Reports Server (NTRS)

Hayashi, A.

1987-01-01

Numerical methods adapted to concurrent processing. Algorithm solves set of coupled partial differential equations by numerical integration. Adapted to run on hypercube computer, algorithm separates problem into smaller problems solved concurrently. Increase in computing speed with concurrent processing over that achievable with conventional sequential processing appreciable, especially for large problems.
Internationalising Professional Skill Development: Are the Rich Getting Richer?

ERIC Educational Resources Information Center

Soontiens, Werner

2004-01-01

Internationalisation of education, and more specifically tertiary education, all over the world has contributed to a significant overhaul in student composition. Parallel to this runs the need for graduates to leave university with a range of professional skills. In response to this, universities actively encourage the development of such skills…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.