Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...
2017-11-14
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
NASA Astrophysics Data System (ADS)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.
2017-11-01
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Parallel family trees for transfer matrices in the Potts model
NASA Astrophysics Data System (ADS)
Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo
2015-02-01
The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.
Taming parallel I/O complexity with auto-tuning
Behzad, Babak; Luu, Huong Vu Thanh; Huchette, Joseph; ...
2013-11-17
We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, andmore » 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.« less
A method for data handling numerical results in parallel OpenFOAM simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anton, Alin; Muntean, Sebastian
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...
2017-01-28
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
A model for optimizing file access patterns using spatio-temporal parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk
2013-01-01
For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Distributed-Memory Computing With the Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA)
NASA Technical Reports Server (NTRS)
Riley, Christopher J.; Cheatwood, F. McNeil
1997-01-01
The Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA), a Navier-Stokes solver, has been modified for use in a parallel, distributed-memory environment using the Message-Passing Interface (MPI) standard. A standard domain decomposition strategy is used in which the computational domain is divided into subdomains with each subdomain assigned to a processor. Performance is examined on dedicated parallel machines and a network of desktop workstations. The effect of domain decomposition and frequency of boundary updates on performance and convergence is also examined for several realistic configurations and conditions typical of large-scale computational fluid dynamic analysis.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Koblinsky, Chester (Technical Monitor)
2001-01-01
A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been implemented for the Poseidon ocean circulation model and tested with a Pacific Basin model configuration. There are about two million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF configuration are discussed. It is shown that the regionalization of the background covariances; has a negligible impact on the quality of the analyses. The parallel algorithm is very efficient for large numbers of observations but does not scale well beyond 100 PEs at the current model resolution. On a platform with distributed memory, memory rather than speed is the limiting factor.
Population Annealing Monte Carlo for Frustrated Systems
NASA Astrophysics Data System (ADS)
Amey, Christopher; Machta, Jonathan
Population annealing is a sequential Monte Carlo algorithm that efficiently simulates equilibrium systems with rough free energy landscapes such as spin glasses and glassy fluids. A large population of configurations is initially thermalized at high temperature and then cooled to low temperature according to an annealing schedule. The population is kept in thermal equilibrium at every annealing step via resampling configurations according to their Boltzmann weights. Population annealing is comparable to parallel tempering in terms of efficiency, but has several distinct and useful features. In this talk I will give an introduction to population annealing and present recent progress in understanding its equilibration properties and optimizing it for spin glasses. Results from large-scale population annealing simulations for the Ising spin glass in 3D and 4D will be presented. NSF Grant DMR-1507506.
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
3D magnetic field configuration of small-scale reconnection events in the solar plasma atmosphere
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shimizu, T., E-mail: shimizu@solar.isas.jaxa.jp; Department of Earth and Planetary Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033
2015-10-15
The outer solar atmosphere, i.e., the corona and the chromosphere, is replete with small energy-release events, which are accompanied by transient brightening and jet-like ejections. These events are considered to be magnetic reconnection events in the solar plasma, and their dynamics have been studied using recent advanced observations from the Hinode spacecraft and other observatories in space and on the ground. These events occur at different locations in the solar atmosphere and vary in their morphology and amount of the released energy. The magnetic field configurations of these reconnection events are inferred based on observations of magnetic fields at themore » photospheric level. Observations suggest that these magnetic configurations can be classified into two groups. In the first group, two anti-parallel magnetic fields reconnect to each other, yielding a 2D emerging flux configuration. In the second group, helical or twisted magnetic flux tubes are parallel or at a relative angle to each other. Reconnection can occur only between anti-parallel components of the magnetic flux tubes and may be referred to as component reconnection. The latter configuration type may be more important for the larger class of small-scale reconnection events. The two types of magnetic configurations can be compared to counter-helicity and co-helicity configurations, respectively, in laboratory plasma collision experiments.« less
Multilevel decomposition of complete vehicle configuration in a parallel computing environment
NASA Technical Reports Server (NTRS)
Bhatt, Vinay; Ragsdell, K. M.
1989-01-01
This research summarizes various approaches to multilevel decomposition to solve large structural problems. A linear decomposition scheme based on the Sobieski algorithm is selected as a vehicle for automated synthesis of a complete vehicle configuration in a parallel processing environment. The research is in a developmental state. Preliminary numerical results are presented for several example problems.
Onset of a Large Ejective Solar Eruption from a Typical Coronal-jet-base Field Configuration
NASA Astrophysics Data System (ADS)
Joshi, Navin Chandra; Sterling, Alphonse C.; Moore, Ronald L.; Magara, Tetsuya; Moon, Yong-Jae
2017-08-01
Utilizing multiwavelength observations and magnetic field data from the Solar Dynamics Observatory (SDO)/Atmospheric Imaging Assembly (AIA), SDO/Helioseismic and Magnetic Imager (HMI), the Geostationary Operational Environmental Satellite (GOES), and RHESSI, we investigate a large-scale ejective solar eruption of 2014 December 18 from active region NOAA 12241. This event produced a distinctive “three-ribbon” flare, having two parallel ribbons corresponding to the ribbons of a standard two-ribbon flare, and a larger-scale third quasi-circular ribbon offset from the other two. There are two components to this eruptive event. First, a flux rope forms above a strong-field polarity inversion line and erupts and grows as the parallel ribbons turn on, grow, and spread apart from that polarity inversion line; this evolution is consistent with the mechanism of tether-cutting reconnection for eruptions. Second, the eruption of the arcade that has the erupting flux rope in its core undergoes magnetic reconnection at the null point of a fan dome that envelops the erupting arcade, resulting in formation of the quasi-circular ribbon; this is consistent with the breakout reconnection mechanism for eruptions. We find that the parallel ribbons begin well before (˜12 minutes) the onset of the circular ribbon, indicating that tether-cutting reconnection (or a non-ideal MHD instability) initiated this event, rather than breakout reconnection. The overall setup for this large-scale eruption (diameter of the circular ribbon ˜105 km) is analogous to that of coronal jets (base size ˜104 km), many of which, according to recent findings, result from eruptions of small-scale “minifilaments.” Thus these findings confirm that eruptions of sheared-core magnetic arcades seated in fan-spine null-point magnetic topology happen on a wide range of size scales on the Sun.
The Influence of Electrode and Channel Configurations on Flow Battery Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Darling, RM; Perry, ML
2014-05-21
Flow batteries with flow-through porous electrodes are compared to cells with porous electrodes adjacent to either parallel or interdigitated channels. Resistances and pressure drops are measured for different configurations to augment the electrochemical data. Cell tests are done with an electrolyte containing VO2+ and VO2+ in sulfuric acid that is circulated through both anode and cathode from a single reservoir. Performance is found to depend sensitively on the combination of electrode and flow field. Theoretical explanations for this dependence are provided. Scale-up of flow through and interdigitated designs to large active areas is also discussed. (C) 2014 The Electrochemical Society.more » All rights reserved.« less
Architectural Implications for Spatial Object Association Algorithms*
Kumar, Vijay S.; Kurc, Tahsin; Saltz, Joel; Abdulla, Ghaleb; Kohn, Scott R.; Matarazzo, Celeste
2013-01-01
Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server®, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation provides insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST). PMID:25692244
Miao Meng; Kiani, Mehdi
2016-08-01
In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.
NASA Astrophysics Data System (ADS)
Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri
2017-12-01
Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a building with building parts containing irregular configuration of non-parallel system, make it more rigid by forming a triangle module, and use the formula.A good collaboration is needed between architects and structural experts in creating earthquake architecture.
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Architectural Implications for Spatial Object Association Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2009-01-29
Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation providesmore » insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).« less
Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves
NASA Astrophysics Data System (ADS)
Qiu, Shi; Liu, Kuang; Eliasson, Veronica
2016-10-01
Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.
NASA Technical Reports Server (NTRS)
Krasteva, Denitza T.
1998-01-01
Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation
NASA Technical Reports Server (NTRS)
O'Connell, Matthew; Karman, Steve L.
2015-01-01
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
Advanced Electric Distribution, Switching, and Conversion Technology for Power Control
NASA Technical Reports Server (NTRS)
Soltis, James V.
1998-01-01
The Electrical Power Control Unit currently under development by Sundstrand Aerospace for use on the Fluids Combustion Facility of the International Space Station is the precursor of modular power distribution and conversion concepts for future spacecraft and aircraft applications. This unit combines modular current-limiting flexible remote power controllers and paralleled power converters into one package. Each unit includes three 1-kW, current-limiting power converter modules designed for a variable-ratio load sharing capability. The flexible remote power controllers can be used in parallel to match load requirements and can be programmed for an initial ON or OFF state on powerup. The unit contains an integral cold plate. The modularity and hybridization of the Electrical Power Control Unit sets the course for future spacecraft electrical power systems, both large and small. In such systems, the basic hybridized converter and flexible remote power controller building blocks could be configured to match power distribution and conversion capabilities to load requirements. In addition, the flexible remote power controllers could be configured in assemblies to feed multiple individual loads and could be used in parallel to meet the specific current requirements of each of those loads. Ultimately, the Electrical Power Control Unit design concept could evolve to a common switch module hybrid, or family of hybrids, for both converter and switchgear applications. By assembling hybrids of a common current rating and voltage class in parallel, researchers could readily adapt these units for multiple applications. The Electrical Power Control Unit concept has the potential to be scaled to larger and smaller ratings for both small and large spacecraft and for aircraft where high-power density, remote power controllers or power converters are required and a common replacement part is desired for multiples of a base current rating.
Onset of a Large Ejective Solar Eruption from a Typical Coronal-jet-base Field Configuration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joshi, Navin Chandra; Magara, Tetsuya; Moon, Yong-Jae
Utilizing multiwavelength observations and magnetic field data from the Solar Dynamics Observatory ( SDO )/Atmospheric Imaging Assembly (AIA), SDO /Helioseismic and Magnetic Imager (HMI), the Geostationary Operational Environmental Satellite ( GOES ), and RHESSI , we investigate a large-scale ejective solar eruption of 2014 December 18 from active region NOAA 12241. This event produced a distinctive “three-ribbon” flare, having two parallel ribbons corresponding to the ribbons of a standard two-ribbon flare, and a larger-scale third quasi-circular ribbon offset from the other two. There are two components to this eruptive event. First, a flux rope forms above a strong-field polarity inversionmore » line and erupts and grows as the parallel ribbons turn on, grow, and spread apart from that polarity inversion line; this evolution is consistent with the mechanism of tether-cutting reconnection for eruptions. Second, the eruption of the arcade that has the erupting flux rope in its core undergoes magnetic reconnection at the null point of a fan dome that envelops the erupting arcade, resulting in formation of the quasi-circular ribbon; this is consistent with the breakout reconnection mechanism for eruptions. We find that the parallel ribbons begin well before (∼12 minutes) the onset of the circular ribbon, indicating that tether-cutting reconnection (or a non-ideal MHD instability) initiated this event, rather than breakout reconnection. The overall setup for this large-scale eruption (diameter of the circular ribbon ∼10{sup 5} km) is analogous to that of coronal jets (base size ∼10{sup 4} km), many of which, according to recent findings, result from eruptions of small-scale “minifilaments.” Thus these findings confirm that eruptions of sheared-core magnetic arcades seated in fan–spine null-point magnetic topology happen on a wide range of size scales on the Sun.« less
Micro-Macro Simulation of Viscoelastic Fluids in Three Dimensions
NASA Astrophysics Data System (ADS)
Rüttgers, Alexander; Griebel, Michael
2012-11-01
The development of the chemical industry resulted in various complex fluids that cannot be correctly described by classical fluid mechanics. For instance, this includes paint, engine oils with polymeric additives and toothpaste. We currently perform multiscale viscoelastic flow simulations for which we have coupled our three-dimensional Navier-Stokes solver NaSt3dGPF with the stochastic Brownian configuration field method on the micro-scale. In this method, we represent a viscoelastic fluid as a dumbbell system immersed in a three-dimensional Newtonian liquid which leads to a six-dimensional problem in space. The approach requires large computational resources and therefore depends on an efficient parallelisation strategy. Our flow solver is parallelised with a domain decomposition approach using MPI. It shows excellent scale-up results for up to 128 processors. In this talk, we present simulation results for viscoelastic fluids in square-square contractions due to their relevance for many engineering applications such as extrusion. Another aspect of the talk is the parallel implementation in NaSt3dGPF and the parallel scale-up and speed-up behaviour.
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis
Frazier, Zachary; Xu, Min; Alber, Frank
2017-01-01
SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan
2017-01-01
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan
2017-08-04
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
High Performance Input/Output for Parallel Computer Systems
NASA Technical Reports Server (NTRS)
Ligon, W. B.
1996-01-01
The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
Frequency-encoded photonic qubits for scalable quantum information processing
Lukens, Joseph M.; Lougovski, Pavel
2016-12-21
Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
Frequency-encoded photonic qubits for scalable quantum information processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lukens, Joseph M.; Lougovski, Pavel
Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
Programming in a proposed 9X distributed Ada
NASA Technical Reports Server (NTRS)
Waldrop, Raymond S.; Volz, Richard A.; Goldsack, Stephen J.
1990-01-01
The proposed Ada 9X constructs for distribution was studied. The goal was to select suitable test cases to help in the evaluation of the proposed constructs. The examples were to be considered according to the following requirements: real time operation; fault tolerance at several different levels; demonstration of both distributed and massively parallel operation; reflection of realistic NASA programs; illustration of the issues of configuration, compilation, linking, and loading; indications of the consequences of using the proposed revisions for large scale programs; and coverage of the spectrum of communication patterns such as predictable, bursty, small and large messages. The first month was spent identifying possible examples and judging their suitability for the project.
Design and analysis of a global sub-mesoscale and tidal dynamics admitting virtual ocean.
NASA Astrophysics Data System (ADS)
Menemenlis, D.; Hill, C. N.
2016-02-01
We will describe the techniques used to realize a global kilometerscale ocean model configuration that includes representation of sea-ice and tidal excitation, and spans scales from planetary gyres to internal tides. A simulation using this model configuration provides a virtual ocean that admits some sub-mesoscale dynamics and tidal energetics not normally represented in global calculations. This extends simulated ocean behavior beyond broadly quasi-geostrophic flows and provides a preliminary example of a next generation computational approach to explicitly probing the interactions between instabilities that are usually parameterized and dominant energetic scales in the ocean. From previous process studies we have ascertained that this can lead to a qualitative improvement in the realism of many significant processes including geostrophic eddy dynamics, shelf-break exchange and topographic mixing. Computationally we exploit high-degrees of parallelism in both numerical evaluation and in recording model state to persistent disk storage. Together this allows us to compute and record a full three-dimensional model trajectory at hourly frequency for a timeperiod of 5 months with less than 9 million core hours of parallel computer time, using the present generation NASA Ames Research Center facilities. We have used this capability to create a 5 month trajectory archive, sampled at high spatial and temporal frequency for an ocean configuration that is initialized from a realistic data-assimilated state and driven with reanalysis surface forcing from ECMWF. The resulting database of model state provides a novel virtual laboratory for exploring coupling across scales in the ocean, and for testing ideas on the relationship between small scale fluxes and large scale state. The computation is complemented by counterpart computations that are coarsened two and four times respectively. In this presentation we will review the computational and numerical technologies employed and show how the high spatio-temporal frequency archive of model state can provide a new and promising tool for researching richer ocean dynamics at scale. We will also outline how computations of this nature could be combined with next generation computer hardware plans to help inform important climate process questions.
NASA Astrophysics Data System (ADS)
Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua
2014-12-01
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn; Deng, Xiaogang; Zhang, Lilun
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore » high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.« less
Design for Run-Time Monitor on Cloud Computing
NASA Astrophysics Data System (ADS)
Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Neural networks within multi-core optic fibers
Cohen, Eyal; Malka, Dror; Shemer, Amir; Shahmoon, Asaf; Zalevsky, Zeev; London, Michael
2016-01-01
Hardware implementation of artificial neural networks facilitates real-time parallel processing of massive data sets. Optical neural networks offer low-volume 3D connectivity together with large bandwidth and minimal heat production in contrast to electronic implementation. Here, we present a conceptual design for in-fiber optical neural networks. Neurons and synapses are realized as individual silica cores in a multi-core fiber. Optical signals are transferred transversely between cores by means of optical coupling. Pump driven amplification in erbium-doped cores mimics synaptic interactions. We simulated three-layered feed-forward neural networks and explored their capabilities. Simulations suggest that networks can differentiate between given inputs depending on specific configurations of amplification; this implies classification and learning capabilities. Finally, we tested experimentally our basic neuronal elements using fibers, couplers, and amplifiers, and demonstrated that this configuration implements a neuron-like function. Therefore, devices similar to our proposed multi-core fiber could potentially serve as building blocks for future large-scale small-volume optical artificial neural networks. PMID:27383911
Neural networks within multi-core optic fibers.
Cohen, Eyal; Malka, Dror; Shemer, Amir; Shahmoon, Asaf; Zalevsky, Zeev; London, Michael
2016-07-07
Hardware implementation of artificial neural networks facilitates real-time parallel processing of massive data sets. Optical neural networks offer low-volume 3D connectivity together with large bandwidth and minimal heat production in contrast to electronic implementation. Here, we present a conceptual design for in-fiber optical neural networks. Neurons and synapses are realized as individual silica cores in a multi-core fiber. Optical signals are transferred transversely between cores by means of optical coupling. Pump driven amplification in erbium-doped cores mimics synaptic interactions. We simulated three-layered feed-forward neural networks and explored their capabilities. Simulations suggest that networks can differentiate between given inputs depending on specific configurations of amplification; this implies classification and learning capabilities. Finally, we tested experimentally our basic neuronal elements using fibers, couplers, and amplifiers, and demonstrated that this configuration implements a neuron-like function. Therefore, devices similar to our proposed multi-core fiber could potentially serve as building blocks for future large-scale small-volume optical artificial neural networks.
Low profile, highly configurable, current sharing paralleled wide band gap power device power module
McPherson, Brice; Killeen, Peter D.; Lostetter, Alex; Shaw, Robert; Passmore, Brandon; Hornberger, Jared; Berry, Tony M
2016-08-23
A power module with multiple equalized parallel power paths supporting multiple parallel bare die power devices constructed with low inductance equalized current paths for even current sharing and clean switching events. Wide low profile power contacts provide low inductance, short current paths, and large conductor cross section area provides for massive current carrying. An internal gate & source kelvin interconnection substrate is provided with individual ballast resistors and simple bolted construction. Gate drive connectors are provided on either left or right size of the module. The module is configurable as half bridge, full bridge, common source, and common drain topologies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reimberg, Paulo; Bernardeau, Francis; Pitrou, Cyril, E-mail: paulo.flose-reimberg@cea.fr, E-mail: francis.bernardeau@cea.fr, E-mail: pitrou@iap.fr
Redshift-space distortions are generally considered in the plane parallel limit, where the angular separation between the two sources can be neglected. Given that galaxy catalogues now cover large fractions of the sky, it becomes necessary to consider them in a formalism which takes into account the wide angle separations. In this article we derive an operational formula for the matter correlators in the Newtonian limit to be used in actual data sets. In order to describe the geometrical nature of the wide angle RSD effect on Fourier space, we extend the formalism developed in configuration space to Fourier space withoutmore » relying on a plane-parallel approximation, but under the extra assumption of no bias evolution. We then recover the plane-parallel limit not only in configuration space where the geometry is simpler, but also in Fourier space, and we exhibit the first corrections that should be included in large surveys as a perturbative expansion over the plane-parallel results. We finally compare our results to existing literature, and show explicitly how they are related.« less
GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.
E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N
2018-03-01
GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.
Methods and apparatus of analyzing electrical power grid data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hafen, Ryan P.; Critchlow, Terence J.; Gibson, Tara D.
Apparatus and methods of processing large-scale data regarding an electrical power grid are described. According to one aspect, a method of processing large-scale data regarding an electrical power grid includes accessing a large-scale data set comprising information regarding an electrical power grid; processing data of the large-scale data set to identify a filter which is configured to remove erroneous data from the large-scale data set; using the filter, removing erroneous data from the large-scale data set; and after the removing, processing data of the large-scale data set to identify an event detector which is configured to identify events of interestmore » in the large-scale data set.« less
An Analysis of Performance Enhancement Techniques for Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)
2002-01-01
The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Zhou, Ruhong
2004-05-01
A highly parallel replica exchange method (REM) that couples with a newly developed molecular dynamics algorithm particle-particle particle-mesh Ewald (P3ME)/RESPA has been proposed for efficient sampling of protein folding free energy landscape. The algorithm is then applied to two separate protein systems, beta-hairpin and a designed protein Trp-cage. The all-atom OPLSAA force field with an explicit solvent model is used for both protein folding simulations. Up to 64 replicas of solvated protein systems are simulated in parallel over a wide range of temperatures. The combined trajectories in temperature and configurational space allow a replica to overcome free energy barriers present at low temperatures. These large scale simulations reveal detailed results on folding mechanisms, intermediate state structures, thermodynamic properties and the temperature dependences for both protein systems.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Strategies for Energy Efficient Resource Management of Hybrid Programming Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Dong; Supinski, Bronis de; Schulz, Martin
2013-01-01
Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less
NASA Technical Reports Server (NTRS)
Sims, J. F.; Hamilton, T.
1972-01-01
Experimental aerodynamic investigations were conducted in the NASA/MSFC 14-inch trisonic wind tunnel during March 1972 on a .003366 scale model of a solid rocket motor version of the space shuttle ascent configuration. The configuration consisted of a parallel burn solid rocket motor booster on an external H-O centerline tank orbiter. Six component aerodynamic force and moment date were recorded over an angle of attack range from -10 to 10 deg at zero degrees sideslip and over a sideslip range from -10 to 10 deg at 0, +6, and -6 deg angle of attack. Mach number ranged from 0.6 to 4.96. The performance and stability characteristics of the complete ascent configuration and build-up, and the effects of variations in tank diameter, orbiter incidence, fairings and positioning of the solid rocket motors and tank fins were determined.
2015-08-01
Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
NASA Astrophysics Data System (ADS)
Jiang, Zhou-Ting; Zhang, Lin-Xi; Sun, Ting-Ting; Wu, Tai-Quan
2009-10-01
The character of forming long-range contacts affects the three-dimensional structure of globular proteins deeply. As the different ability to form long-range contacts between 20 types of amino acids and 4 categories of globular proteins, the statistical properties are thoroughly discussed in this paper. Two parameters NC and ND are defined to confine the valid residues in detail. The relationship between hydrophobicity scales and valid residue percentage of each amino acid is given in the present work and the linear functions are shown in our statistical results. It is concluded that the hydrophobicity scale defined by chemical derivatives of the amino acids and nonpolar phase of large unilamellar vesicle membranes is the most effective technique to characterise the hydrophobic behavior of amino acid residues. Meanwhile, residue percentage Pi and sequential residue length Li of a certain protein i are calculated under different conditions. The statistical results show that the average value of Pi as well as Li of all-α proteins has a minimum among these 4 classes of globular proteins, indicating that all-α proteins are hardly capable of forming long-range contacts one by one along their linear amino acid sequences. All-β proteins have a higher tendency to construct long-range contacts along their primary sequences related to the secondary configurations, i.e. parallel and anti-parallel configurations of β sheets. The investigation of the interior properties of globular proteins give us the connection between the three-dimensional structure and its primary sequence data or secondary configurations, and help us to understand the structure of protein and its folding process well.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations
NASA Astrophysics Data System (ADS)
Pan, Yan; Dai, Xiaoying; de Gironcoli, Stefano; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui
2017-11-01
Motivated by the recently proposed parallel orbital-updating approach in real space method [1], we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Evolution of the Busbar Structure in Large-Scale Aluminum Reduction Cells
NASA Astrophysics Data System (ADS)
Zhang, Hongliang; Liang, Jinding; Li, Jie; Sun, Kena; Xiao, Jin
2017-02-01
Studies of magnetic field and magneto-hydro-dynamics are regarded as the foundation for the development of large-scale aluminum reduction cells, while due to the direct relationship between the busbar configuration and magnetic compensation, the actual key content is the configuration of the busbar. As the line current has been increased from 160 kA to 600 kA, the configuration of the busbar was becoming more complex. To summarize and explore the evolution of busbar configuration in aluminum reduction cells, this paper has reviewed various representative large-scale pre-baked aluminum reduction cell busbar structures, such as end-to-end potlines, side-by-side potlines and external compensation current. The advantages and disadvantages in the magnetic distribution or technical specifications have also been introduced separately, especially for the configurations of the mainstream 400-kA potlines. In the end, the development trends of the bus structure configuration were prospected, based on the recent successful applications of super-scale cell busbar structures in China (500-600 kA).
Aquilante, Francesco; Autschbach, Jochen; Carlson, Rebecca K; Chibotaru, Liviu F; Delcey, Mickaël G; De Vico, Luca; Fdez Galván, Ignacio; Ferré, Nicolas; Frutos, Luis Manuel; Gagliardi, Laura; Garavelli, Marco; Giussani, Angelo; Hoyer, Chad E; Li Manni, Giovanni; Lischka, Hans; Ma, Dongxia; Malmqvist, Per Åke; Müller, Thomas; Nenov, Artur; Olivucci, Massimo; Pedersen, Thomas Bondo; Peng, Daoling; Plasser, Felix; Pritchard, Ben; Reiher, Markus; Rivalta, Ivan; Schapiro, Igor; Segarra-Martí, Javier; Stenrup, Michael; Truhlar, Donald G; Ungur, Liviu; Valentini, Alessio; Vancoillie, Steven; Veryazov, Valera; Vysotskiy, Victor P; Weingart, Oliver; Zapata, Felipe; Lindh, Roland
2016-02-15
In this report, we summarize and describe the recent unique updates and additions to the Molcas quantum chemistry program suite as contained in release version 8. These updates include natural and spin orbitals for studies of magnetic properties, local and linear scaling methods for the Douglas-Kroll-Hess transformation, the generalized active space concept in MCSCF methods, a combination of multiconfigurational wave functions with density functional theory in the MC-PDFT method, additional methods for computation of magnetic properties, methods for diabatization, analytical gradients of state average complete active space SCF in association with density fitting, methods for constrained fragment optimization, large-scale parallel multireference configuration interaction including analytic gradients via the interface to the Columbus package, and approximations of the CASPT2 method to be used for computations of large systems. In addition, the report includes the description of a computational machinery for nonlinear optical spectroscopy through an interface to the QM/MM package Cobramm. Further, a module to run molecular dynamics simulations is added, two surface hopping algorithms are included to enable nonadiabatic calculations, and the DQ method for diabatization is added. Finally, we report on the subject of improvements with respects to alternative file options and parallelization. © 2015 Wiley Periodicals, Inc.
An approach to enhance pnetCDF performance in ...
Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed to address this issue by providing parallel I/O capability. This study examines the performance of an application-level data aggregation approach which performs data aggregation along either row or column dimension of MPI (Message Passing Interface) processes on a spatially decomposed domain, and then applies the pnetCDF parallel I/O paradigm. The test was done with three different domain sizes which represent small, moderately large, and large data domains, using a small-scale Community Multiscale Air Quality model (CMAQ) mock-up code. The examination includes comparing I/O performance with traditional serial I/O technique, straight application of pnetCDF, and the data aggregation along row and column dimension before applying pnetCDF. After the comparison, "optimal" I/O configurations of this application-level data aggregation approach were quantified. Data aggregation along the row dimension (pnetCDFcr) works better than along the column dimension (pnetCDFcc) although it may perform slightly worse than the straight pnetCDF method with a small number of processors. When the number of processors becomes larger, pnetCDFcr outperforms pnetCDF significantly. If the number of proces
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Li-Ling; School of Science, Hunan University of Technology, Zhuzhou 412007; Yang, Bing-Chu, E-mail: bingchuyang@csu.edu.cn
2014-07-21
Spin-dependent transport properties of nanodevices constructed by iron-phthalocyanine (FePc) molecule sandwiched between two zigzag graphene nanoribbon electrodes are studied using first-principles quantum transport calculations. The effects of the symmetry and spin configuration of electrodes have been taken into account. It is found that large magnetoresistance, large spin polarization, dual spin-filtering, and negative differential resistance (NDR) can coexist in these devices. Our results show that 5Z-FePc system presents well conductive ability in both parallel (P) and anti-parallel (AP) configurations. For 6Z-FePc-P system, spin filtering effect and large spin polarization can be found. A dual spin filtering and NDR can also bemore » shown in 6Z-FePc-AP. Our studies indicate that the dual spin filtering effect depends on the orbitals symmetry of the energy bands and spin mismatching of the electrodes. And all the effects would open up possibilities for their applications in spin-valve, spin-filter as well as effective spin diode devices.« less
NASA Astrophysics Data System (ADS)
Oya, Yoko; Sakai, Nami; Lefloch, Bertrand; López-Sepulcre, Ana; Watanabe, Yoshimasa; Ceccarelli, Cecilia; Yamamoto, Satoshi
2015-10-01
Subarcsecond-resolution images of the rotational line emissions of CS and c-C3H2 obtained toward the low-mass protostar IRAS 04368+2557 in L1527 with the Atacama Large Millimeter/submillimeter Array are investigated to constrain the orientation of the outflow/envelope system. The distribution of CS consists of an envelope component extending from north to south and a faint butterfly shaped outflow component. The kinematic structure of the envelope is well reproduced by a simple ballistic model of an infalling rotating envelope. Although the envelope has a nearly edge-on configuration, we find that the western side of the envelope faces the observer. This configuration is opposite to the direction of the large-scale (˜104 AU) outflow suggested previously from the 12CO (J = 3-2) observation, and to the morphology of infrared reflection near the protostar (˜200 AU). The latter discrepancy could originate from high extinction by the outflow cavity of the western side, or may indicate that the outflow axis is not parallel to the rotation axis of the envelope. Position-velocity diagrams show the accelerated outflow cavity wall, and its kinematic structure in the 2000 AU scale is explained by a standard parabolic model with the inclination angle derived from the analysis of the envelope. The different orientation of the outflow between the small and large scale implies a possibility of precession of the outflow axis. The shape and the velocity of the outflow in the vicinity of the protostar are compared with those of other protostars.
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Factorization in large-scale many-body calculations
Johnson, Calvin W.; Ormand, W. Erich; Krastev, Plamen G.
2013-08-07
One approach for solving interacting many-fermion systems is the configuration-interaction method, also sometimes called the interacting shell model, where one finds eigenvalues of the Hamiltonian in a many-body basis of Slater determinants (antisymmetrized products of single-particle wavefunctions). The resulting Hamiltonian matrix is typically very sparse, but for large systems the nonzero matrix elements can nonetheless require terabytes or more of storage. An alternate algorithm, applicable to a broad class of systems with symmetry, in our case rotational invariance, is to exactly factorize both the basis and the interaction using additive/multiplicative quantum numbers; such an algorithm recreates the many-body matrix elementsmore » on the fly and can reduce the storage requirements by an order of magnitude or more. Here, we discuss factorization in general and introduce a novel, generalized factorization method, essentially a ‘double-factorization’ which speeds up basis generation and set-up of required arrays. Although we emphasize techniques, we also place factorization in the context of a specific (unpublished) configuration-interaction code, BIGSTICK, which runs both on serial and parallel machines, and discuss the savings in memory due to factorization.« less
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O. (Editor); Housner, Jerrold M. (Editor)
1993-01-01
Computing speed is leaping forward by several orders of magnitude each decade. Engineers and scientists gathered at a NASA Langley symposium to discuss these exciting trends as they apply to parallel computational methods for large-scale structural analysis and design. Among the topics discussed were: large-scale static analysis; dynamic, transient, and thermal analysis; domain decomposition (substructuring); and nonlinear and numerical methods.
NASA Technical Reports Server (NTRS)
Sims, F.
1972-01-01
Experimental aerodynamic investigations were conducted in the NASA/MSFC 14-inch trisonic wind tunnel during April 1972 on a 0.004-scale model of a solid rocket motor version of the space shuttle ascent configuration. The configuration consisted of a parallel burn solid rocket motor booster on an external HO centerline tank orbiter. Six component aerodynamic force and moment data were recorded over an angle of attack range from -10 deg to +10 deg at zero degrees sideslip and over a sideslip range from -10 deg to +10 deg at zero degrees angle of attack. Mach numbers ranged from 0.6 to 4.96. The purpose of the test was to determine the performance and stability characteristics of the complete ascent configuration and buildup, and to determine the effects of variations in HO tank and SRM nose shaping, orbiter incidence and position, and position of the solid rocket motors.
Hydrodynamic interaction between two trapped swimming model micro-organisms.
Matas Navarro, R; Pagonabarraga, I
2010-09-01
We present a theoretical study of the behaviour of two active particles under the action of harmonic traps kept at a fixed distance away from each other. We classify the steady configurations the squirmers develop as a function of their self-propelling velocity and the active stresses the swimmers induce around them. We have further analyzed the stability of such configurations, and have found that the ratio between their self-propelling velocity and the apolar flow generated through active stresses determines whether collinear parallel squirmers or perpendicularly swimming particles moving away from each other are stable. Therefore, there is a close connection between the stable configurations and the active mechanisms leading to the particle self-propulsion. The trap potential does not affect the stability of the configurations; it only modifies some of their relevant time scales. We have also observed the development of characteristic frequencies which should be observable. Finally, we show that the development of the hydrodynamic flows induced by the active particles may be relevant even when its time scale orders of magnitude smaller than the other present characteristic time scales and may destabilize the stable configurations.
Large magnetoresistance in oxide based ferromagnet/superconductor spin switches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pena, V.; Nemes, N.; Visani, C.
2006-01-01
We report large magnetoresistance (in excess of 1000%) in ferromagnet / superconductor / ferromagnet structures made of La{sub 0.7}Ca{sub 0.3}MnO{sub 3} and YBa{sub 2}Cu{sub 3}O{sub 7} in the current in plane (CIP) geometry. This magnetoresistance has many of the ingredients of the giant magnetoresistance of metallic superlattices: it is independent on the angle between current and magnetic field, depends on the relative orientation of the magnetization in the ferromagnetic layers, and takes very large values. The origin is enhanced scattering at the F/S interface in the anti parallel configuration of the magnetizations. Furthermore, we examine the dependence of the magnetoresistancemore » effect on the thickness of the superconducting layer, and show that the magnetoresistance dies out for thickness in excess of 30 nm, setting a length scale for the diffusion of spin polarized quasiparticles.« less
Comparison between four dissimilar solar panel configurations
NASA Astrophysics Data System (ADS)
Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.
2017-12-01
Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
The Diversity of School Organizational Configurations
ERIC Educational Resources Information Center
Lee, Linda C.
2013-01-01
School reform on a large scale has largely been unsuccessful. Approaches designed to document and understand the variety of organizational conditions that comprise our school systems are needed so that reforms can be tailored and results scaled. Therefore, this article develops a configurational framework that allows a systematic analysis of many…
NASA Astrophysics Data System (ADS)
Brogi, Cosimo; Huisman, Johan Alexander; Kaufmann, Manuela Sarah; von Hebel, Christian; van der Kruk, Jan; Vereecken, Harry
2017-04-01
Soil subsurface structures can play a key role in crop performance, especially during water stress periods. Geophysical techniques like electromagnetic induction EMI have been shown to be able of providing information about dominant shallow subsurface features. However, previous work with EMI has typically not reached beyond the field scale. The objective of this study is to use large-scale multi-configuration EMI to characterize patterns of soil structural organization (layering and texture) and the associated impact on crop vegetation at the km2 scale. For this, we carried out an intensive measurement campaign and collected high spatial resolution multi-configuration EMI data on an agricultural area of approx. 1 km2 (102 ha) near Selhausen (North Rhine-Westphalia, Germany) with a maximum depth of investigation of around 2.5 m. We measured using two EMI instruments simultaneously with a total of nine coil configurations. The instruments were placed inside polyethylene sleds that were pulled by an all-terrain-vehicle along parallel lines with a spacing of 2 to 2.5 m. The driving speed was between 5 and 7 km h-1 and we used a 0.2 Hz sampling frequency to obtain an in-line resolution of approximately 0.3 m. The survey area consists of almost 50 different fields managed in different way. The EMI measurements were collected between April and December 2016 within a few days after the harvest of each field. After data acquisition, EMI data were automatically filtered, temperature corrected, and interpolated onto a common grid. The resulting EMI maps allowed us to identify three main areas with different subsurface heterogeneities. The differences between these areas are likely related to the late quaternary geological history (Pleistocene and Holocene) of the area that resulted in spatially variable soil texture and layering, which has a strong impact on spatio-temporal soil water content variability. The high resolution surveys also allowed us to identify small scale geomorphological structures as well as anthropogenic activities such as soil management and drainage networks carried out in the last 150 years. To identify areas with similar subsurface structures with high spatial resolution, we applied multiband image classification using the nine coil configurations as bands of a single image. We compared both supervised and unsupervised classification and obtained promising preliminary results showing a good degree of conformity between EMI supervised classification maps and observed patterns in plant productivity.
Wall Modeled Large Eddy Simulation of Airfoil Trailing Edge Noise
NASA Astrophysics Data System (ADS)
Kocheemoolayil, Joseph; Lele, Sanjiva
2014-11-01
Large eddy simulation (LES) of airfoil trailing edge noise has largely been restricted to low Reynolds numbers due to prohibitive computational cost. Wall modeled LES (WMLES) is a computationally cheaper alternative that makes full-scale Reynolds numbers relevant to large wind turbines accessible. A systematic investigation of trailing edge noise prediction using WMLES is conducted. Detailed comparisons are made with experimental data. The stress boundary condition from a wall model does not constrain the fluctuating velocity to vanish at the wall. This limitation has profound implications for trailing edge noise prediction. The simulation over-predicts the intensity of fluctuating wall pressure and far-field noise. An improved wall model formulation that minimizes the over-prediction of fluctuating wall pressure is proposed and carefully validated. The flow configurations chosen for the study are from the workshop on benchmark problems for airframe noise computations. The large eddy simulation database is used to examine the adequacy of scaling laws that quantify the dependence of trailing edge noise on Mach number, Reynolds number and angle of attack. Simplifying assumptions invoked in engineering approaches towards predicting trailing edge noise are critically evaluated. We gratefully acknowledge financial support from GE Global Research and thank Cascade Technologies Inc. for providing access to their massively-parallel large eddy simulation framework.
NASA Astrophysics Data System (ADS)
Stewart, Cameron; Najjar, Fady; Stewart, D. Scott; Bdzil, John
2012-11-01
Modern-engineered high explosive (HE) materials can consist of a matrix of solid, inert particles embedded into an HE charge. When this charge is detonated, intense shock waves are generated. As these intense shocks interact with the inert particles, large deformations occur in the particles while the incident shock diffracts around the particle interface. We will present results from a series of 3-D DNS of an intense shock interacting with unit-cube configurations of inert particles embedded into nitromethane. The LLNL multi-physics massively parallel hydrodynamics code ALE3D is used to carry out high-resolution (4 million nodes) simulations. Three representative unit-cube configurations are considered: primitive cubic, face-centered and body-centered cubic for two particle material types of varying impedance ratios. Previous work has only looked at in-line particles configurations. We investigate the time evolution of the unit cell configurations, vorticity being generated by the shock interaction, as well as the velocity and acceleration of the particles until they reach the quasi-steady regime. LLNL-ABS-567694. CSS was supported by a summer internship through the HEDP program at LLNL. FMN's work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu
2018-04-20
A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho
2014-01-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2014-11-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
NASA Technical Reports Server (NTRS)
Fairbank, W. M.; Everitt, C. W. F.; Debra, D. B.
1977-01-01
A satellite configuration having two gyroscopes with axes parallel to the boresight of a telescope and two at right angles to the telescope and approximately parallel and perpendicular to the earth's axis is proposed for measuring geodetic precessions due to the earth's motion about the sun, higher order geodetic terms calculated from the earth's quadrapole mass moment (0.010 arc-sec/year in a 400 nautical mile polar orbit), and deflection by the sun of the starlight signal for the reference telescope. Data from the experiment also contain large periodic signals due to the annual and orbital aberrations of starlight which are useful in providing a built in reference signal of known amplitude for scaling the relativity signals, and should yield a singularly precise measurement of the parallax of the reference star. The development of the gyroscope and its readout system are discussed, as well as signal integration, drag-free control, and attitude control.
Study of a hybrid multispectral processor
NASA Technical Reports Server (NTRS)
Marshall, R. E.; Kriegler, F. J.
1973-01-01
A hybrid processor is described offering enough handling capacity and speed to process efficiently the large quantities of multispectral data that can be gathered by scanner systems such as MSDS, SKYLAB, ERTS, and ERIM M-7. Combinations of general-purpose and special-purpose hybrid computers were examined to include both analog and digital types as well as all-digital configurations. The current trend toward lower costs for medium-scale digital circuitry suggests that the all-digital approach may offer the better solution within the time frame of the next few years. The study recommends and defines such a hybrid digital computing system in which both special-purpose and general-purpose digital computers would be employed. The tasks of recognizing surface objects would be performed in a parallel, pipeline digital system while the tasks of control and monitoring would be handled by a medium-scale minicomputer system. A program to design and construct a small, prototype, all-digital system has been started.
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)
2000-01-01
HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heidelberg, S T; Fitzgerald, K J; Richmond, G H
2006-01-24
There has been substantial development of the Lustre parallel filesystem prior to the configuration described below for this milestone. The initial Lustre filesystems that were deployed were directly connected to the cluster interconnect, i.e. Quadrics Elan3. That is, the clients (OSSes) and Meta-data Servers (MDS) were all directly connected to the cluster's internal high speed interconnect. This configuration serves a single cluster very well, but does not provide sharing of the filesystem among clusters. LLNL funded the development of high-efficiency ''portals router'' code by CFS (the company that develops Lustre) to enable us to move the Lustre servers to amore » GigE-connected network configuration, thus making it possible to connect to the servers from several clusters. With portals routing available, here is what changes: (1) another storage-only cluster is deployed to front the Lustre storage devices (these become the Lustre OSSes and MDS), (2) this ''Lustre cluster'' is attached via GigE connections to a large GigE switch/router cloud, (3) a small number of compute-cluster nodes are designated as ''gateway'' or ''portal router'' nodes, and (4) the portals router nodes are GigE-connected to the switch/router cloud. The Lustre configuration is then changed to reflect the new network paths. A typical example of this is a compute cluster and a related visualization cluster: the compute cluster produces the data (writes it to the Lustre filesystem), and the visualization cluster consumes some of the data (reads it from the Lustre filesystem). This process can be expanded by aggregating several collections of Lustre backend storage resources into one or more ''centralized'' Lustre filesystems, and then arranging to have several ''client'' clusters mount these centralized filesystems. The ''client clusters'' can be any combination of compute, visualization, archiving, or other types of cluster. This milestone demonstrates the operation and performance of a scaled-down version of such a large, centralized, shared Lustre filesystem concept.« less
Improving parallel I/O autotuning with performance modeling
Behzad, Babak; Byna, Surendra; Wild, Stefan M.; ...
2014-01-01
Various layers of the parallel I/O subsystem offer tunable parameters for improving I/O performance on large-scale computers. However, searching through a large parameter space is challenging. We are working towards an autotuning framework for determining the parallel I/O parameters that can achieve good I/O performance for different data write patterns. In this paper, we characterize parallel I/O and discuss the development of predictive models for use in effectively reducing the parameter space. Furthermore, applying our technique on tuning an I/O kernel derived from a large-scale simulation code shows that the search time can be reduced from 12 hours to 2more » hours, while achieving 54X I/O performance speedup.« less
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, R.; Imbriale, W.; Liewer, P.; Lyons, J.; Manshadi, F.; Patterson, J.
1987-01-01
The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture.
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Polymer scaling and dynamics in steady-state sedimentation at infinite Péclet number.
Lehtola, V; Punkkinen, O; Ala-Nissila, T
2007-11-01
We consider the static and dynamical behavior of a flexible polymer chain under steady-state sedimentation using analytic arguments and computer simulations. The model system comprises a single coarse-grained polymer chain of N segments, which resides in a Newtonian fluid as described by the Navier-Stokes equations. The chain is driven into nonequilibrium steady state by gravity acting on each segment. The equations of motion for the segments and the Navier-Stokes equations are solved simultaneously using an immersed boundary method, where thermal fluctuations are neglected. To characterize the chain conformation, we consider its radius of gyration RG(N). We find that the presence of gravity explicitly breaks the spatial symmetry leading to anisotropic scaling of the components of RG with N along the direction of gravity RG, parallel and perpendicular to it RG, perpendicular, respectively. We numerically estimate the corresponding anisotropic scaling exponents nu parallel approximately 0.79 and nu perpendicular approximately 0.45, which differ significantly from the equilibrium scaling exponent nue=0.588 in three dimensions. This indicates that on the average, the chain becomes elongated along the sedimentation direction for large enough N. We present a generalization of the Flory scaling argument, which is in good agreement with the numerical results. It also reveals an explicit dependence of the scaling exponents on the Reynolds number. To study the dynamics of the chain, we compute its effective diffusion coefficient D(N), which does not contain Brownian motion. For the range of values of N used here, we find that both the parallel and perpendicular components of D increase with the chain length N, in contrast to the case of thermal diffusion in equilibrium. This is caused by the fluid-driven fluctuations in the internal configuration of the polymer that are magnified as polymer size becomes larger.
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
del-Castillo-Negrete, Diego; Blazevski, Daniel
2016-04-01
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay of modulated heat pulses.« less
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nomura, K; Seymour, R; Wang, W
2009-02-17
A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
NASA Technical Reports Server (NTRS)
Dods, J. B., Jr.; Hanly, R. D.; Efting, J. H.
1975-01-01
Shadowgraphs of five space shuttle launch configurations are presented. The model was a 4 percent-scale space shuttle vehicle, tested in the 11- by 11-foot Transonic Wind Tunnel at Ames Research Center. The Mach number was varied from 0.8 to 1.4 with three angles of sideslip (0 deg, 5 deg and -5 deg) that were used in conjunction with three angles of attack (4 deg, -4 deg, and 0 deg). The model configurations included both series-burn and parallel-burn configurations, two canopy configurations, two positions of the orbiter nose relative to the HO tank nose, and two HO tank nose-cones angles (15 deg and 20 deg). The data consist entirely of shadowgraph photographs.
2013-08-01
potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
Grid-Enabled Quantitative Analysis of Breast Cancer
2010-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...research, we designed a pilot study utilizing large scale parallel Grid computing harnessing nationwide infrastructure for medical image analysis . Also
[Parallel virtual reality visualization of extreme large medical datasets].
Tang, Min
2010-04-01
On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
NASA Astrophysics Data System (ADS)
Mal, Priyanath; Bera, G.; Turpu, G. R.; Srivastava, Sunil K.; Das, Pradip
2018-05-01
We present a study of structural and vibrational properties of topological insulator GeBi4Te7. Modified Bridgeman technique is employed to synthesize the single crystal with relatively large crystalline faces. Sharp (0 0 l) reflection confirms the high crystallinity of the single crystal. We have performed temperature dependent Raman measurement for both parallel and perpendicular to crystallographic c axis geometry. In parallel configuration we have observed seven Raman modes whereas in perpendicular geometry only four of these are identified. Appearance and disappearance of Raman modes having different intensities for parallel and perpendicular to c measurement attribute to the mode polarization. Progressive blue shift is observed with lowering temperature, reflects the increase in internal stress.
Ergül, Özgür
2011-11-01
Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA
NASA Astrophysics Data System (ADS)
Lin, Y.; Wang, F.; Liu, B.
2018-05-01
Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
The Parallel System for Integrating Impact Models and Sectors (pSIMS)
NASA Technical Reports Server (NTRS)
Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian
2014-01-01
We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
NASA Astrophysics Data System (ADS)
Hartmann, Alfred; Redfield, Steve
1989-04-01
This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
Computations on Wings With Full-Span Oscillating Control Surfaces Using Navier-Stokes Equations
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2013-01-01
A dual-level parallel procedure is presented for computing large databases to support aerospace vehicle design. This procedure has been developed as a single Unix script within the Parallel Batch Submission environment utilizing MPIexec and runs MPI based analysis software. It has been developed to provide a process for aerospace designers to generate data for large numbers of cases with the highest possible fidelity and reasonable wall clock time. A single job submission environment has been created to avoid keeping track of multiple jobs and the associated system administration overhead. The process has been demonstrated for computing large databases for the design of typical aerospace configurations, a launch vehicle and a rotorcraft.
Cryogenic liquid-level detector
NASA Technical Reports Server (NTRS)
Hamlet, J.
1978-01-01
Detector is designed for quick assembly, fast response, and good performance under vibratory stress. Its basic parallel-plate open configuration can be adapted to any length and allows its calibration scale factor to be predicted accurately. When compared with discrete level sensors, continuous reading sensor was found to be superior if there is sloshing, boiling, or other disturbance.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Fiber optic plantar pressure/shear sensor
NASA Astrophysics Data System (ADS)
Soetanto, William; Nguyen, Ngoc T.; Wang, Wei-Chih
2011-04-01
A full-scale foot pressure/shear sensor that has been developed to help diagnose the cause of ulcer formation in diabetic patients is presented. The design involves a tactile sensor array using intersecting optical fibers embedded in soft elastomer. The basic configuration incorporates a mesh that is comprised of two sets of parallel optical fiber plane; the planes are configured so the parallel rows of fiber of the top and bottom planes are perpendicular to each other. Threedimensional information is determined by measuring the loss of light from each of the waveguide to map the overall pressure distribution and the shifting of the layers relative to each other. In this paper we will present the latest development on the fiber optic plantar pressure/shear sensor which can measure normal force up from 19.09 kPa to 1000 kPa.
Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong
2017-07-01
Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.
NASA Astrophysics Data System (ADS)
Nijssen, B.; Hamman, J.; Bohn, T. J.
2015-12-01
The Variable Infiltration Capacity (VIC) model is a macro-scale semi-distributed hydrologic model. VIC development began in the early 1990s and it has been used extensively, applied from basin to global scales. VIC has been applied in a many use cases, including the construction of hydrologic data sets, trend analysis, data evaluation and assimilation, forecasting, coupled climate modeling, and climate change impact analysis. Ongoing applications of the VIC model include the University of Washington's drought monitor and forecast systems, and NASA's land data assimilation systems. The development of VIC version 5.0 focused on reconfiguring the legacy VIC source code to support a wider range of modern modeling applications. The VIC source code has been moved to a public Github repository to encourage participation by the model development community-at-large. The reconfiguration has separated the physical core of the model from the driver, which is responsible for memory allocation, pre- and post-processing and I/O. VIC 5.0 includes four drivers that use the same physical model core: classic, image, CESM, and Python. The classic driver supports legacy VIC configurations and runs in the traditional time-before-space configuration. The image driver includes a space-before-time configuration, netCDF I/O, and uses MPI for parallel processing. This configuration facilitates the direct coupling of streamflow routing, reservoir, and irrigation processes within VIC. The image driver is the foundation of the CESM driver; which couples VIC to CESM's CPL7 and a prognostic atmosphere. Finally, we have added a Python driver that provides access to the functions and datatypes of VIC's physical core from a Python interface. This presentation demonstrates how reconfiguring legacy source code extends the life and applicability of a research model.
COMPARISON OF PARALLEL AND SERIES HYBRID POWERTRAINS FOR TRANSIT BUS APPLICATION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Zhiming; Daw, C Stuart; Smith, David E
2016-01-01
The fuel economy and emissions of both conventional and hybrid buses equipped with emissions aftertreatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicate that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar CO and HC tailpipe emissions but were also predicted to have reduced NOx tailpipe emissions compared to the conventional bus in higher speed cycles. For the New York bus cycle (NYBC), which has the lowestmore » average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus, while the parallel hybrid bus had significantly lower tailpipe emissions. All three bus powertrains were found to require periodic active DPF regeneration to maintain PM control. Plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed due to the relatively large battery capacity that is typical of the series hybrid configuration.« less
In vivo verification of particle therapy: how Compton camera configurations affect 3D image quality
NASA Astrophysics Data System (ADS)
Mackin, D.; Draeger, E.; Peterson, S.; Polf, J.; Beddar, S.
2017-05-01
The steep dose gradients enabled by the Bragg peaks of particle therapy beams are a double edged sword. They enable highly conformal dose distributions, but even small deviations from the planned beam range can cause overdosing of healthy tissue or under-dosing of the tumour. To reduce this risk, particle therapy treatment plans include margins large enough to account for all the sources of range uncertainty, which include patient setup errors, patient anatomy changes, and CT number to stopping power ratios. Any system that could verify the beam range in vivo, would allow reduced margins and more conformal dose distributions. Toward our goal developing such a system based on Compton camera (CC) imaging, we studied how three configurations (single camera, parallel opposed, and orthogonal) affect the quality of the 3D images. We found that single CC and parallel opposed configurations produced superior images in 2D. The increase in parallax produced by an orthogonal CC configuration was shown to be beneficial in producing artefact free 3D images.
Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang
2008-01-01
Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
2012-10-01
using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
A Programmable and Configurable Mixed-Mode FPAA SoC
2016-03-17
A Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins, Richard...Abstract: The authors present a Floating-Gate based, System-On-Chip large-scale Field- Programmable Analog Array IC that integrates divergent concepts...Floating-Gate, SoC, Command Word Classification This paper presents a Floating-Gate (FG) based, System- On-Chip (SoC) large-scale Field- Programmable
Local and non-local deficits in amblyopia: acuity and spatial interactions.
Bonneh, Yoram S; Sagi, Dov; Polat, Uri
2004-12-01
Amblyopic vision is thought to be limited by abnormal long-range spatial interactions, but their exact mode of action and relationship to the main amblyopic deficit in visual acuity is largely unknown. We studied this relationship in a group (N=59) of anisometropic (N=21) and strabismic (or combined, N=38) subjects, using (1) a single and multi-pattern (crowded) computerized static Tumbling-E test with scaled spacing of two pattern widths (TeVA), in addition to an optotype (ETDRS chart) acuity test (VA) and (2) contrast detection of Gabor patches with lateral flankers (lateral masking) along the horizontal and vertical axes as well as in collinear and parallel configurations. By correlating the different measures of visual acuity and contrast suppression, we found that (1) the VA of the strabismic subjects could be decomposed into two uncorrelated components measured in TeVA: acuity for isolated patterns and acuity reduction due to flanking patterns. The latter comprised over 60% of the VA magnitude, on the average and accounted for over 50% of its variance. In contrast, a slight reduction in acuity was found in the anisometropic subjects, and the acuity for a single pattern could account for 70% of the VA variance. (2) The lateral suppression (contrast threshold elevation) in a parallel configuration along the horizontal axis was correlated with the VA (R2=0.7), as well as with the crowding effect (TeVA elevation, R2=0.5) for the strabismic group. Some correlation with the VA was also found for the collinear configuration in the anisometropic group, but less suppression and no correlation were found for all the vertical configurations in all the groups. The results indicate the existence of a specific non-local component of the strabismic deficit, in addition to the local acuity deficit in all amblyopia types. This deficit might reflect long-range lateral inhibition, or alternatively, an inaccurate and scattered top-down attentional selection mechanism.
High-speed volume measurement system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lane, Michael H.; Doyle, Jr., James L.; Brinkman, Michael J.
2018-01-30
Disclosed is a volume sensor having a first axis, a second axis, and a third axis, each axis including a laser source configured to emit a beam; a parallel beam generating assembly configured to receive the beam and split the beam into a first parallel beam and a second parallel beam, a beam-collimating assembly configured to receive the first parallel beam and the second parallel beam and output a first beam sheet and a second beam sheet, the first beam sheet and the second beam sheet being configured to traverse the object aperture; a first collecting lens and a secondmore » collecting lens; and a first photodetector and a second photodetector, the first photodetector and the second photodetector configured to output an electrical signal proportional to the object; wherein the first axis, the second axis, and the third axis are arranged at an angular offset with respect to each other.« less
Visual analysis of inter-process communication for large-scale parallel computing.
Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu
2009-01-01
In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Configuration affects parallel stent grafting results.
Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L
2018-05-01
A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31.25% of retrograde stents having any complication. Parallel stent grafting offers an off-the-shelf option to treat a variety of aortic diseases. There is an increased risk of parallel stent and overall EVAR compromise with <10% main body oversizing. Thirty-day mortality is increased when more than one parallel stent is placed. Antegrade configurations are preferred to any retrograde configuration, with optimal oversizing >20%. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S; Seal, Sudip K
2011-01-01
In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Aircraft Engine Noise Scattering By Fuselage and Wings: A Computational Approach
NASA Technical Reports Server (NTRS)
Stanescu, D.; Hussaini, M. Y.; Farassat, F.
2003-01-01
The paper presents a time-domain method for computation of sound radiation from aircraft engine sources to the far-field. The effects of nonuniform flow around the aircraft and scattering of sound by fuselage and wings are accounted for in the formulation. The approach is based on the discretization of the inviscid flow equations through a collocation form of the Discontinuous Galerkin spectral element method. An isoparametric representation of the underlying geometry is used in order to take full advantage of the spectral accuracy of the method. Large-scale computations are made possible by a parallel implementation based on message passing. Results obtained for radiation from an axisymmetric nacelle alone are compared with those obtained when the same nacelle is installed in a generic configuration, with and without a wing.
Aircraft Engine Noise Scattering by Fuselage and Wings: A Computational Approach
NASA Technical Reports Server (NTRS)
Stanescu, D.; Hussaini, M. Y.; Farassat, F.
2003-01-01
The paper presents a time-domain method for computation of sound radiation from aircraft engine sources to the far-field. The effects of nonuniform flow around the aircraft and scattering of sound by fuselage and wings are accounted for in the formulation. The approach is based on the discretization of the inviscid flow equations through a collocation form of the Discontinuous Galerkin spectral element method. An isoparametric representation of the underlying geometry is used in order to take full advantage of the spectral accuracy of the method. Large-scale computations are made possible by a parallel implementation based on message passing. Results obtained for radiation from an axisymmetric nacelle alone are compared with those obtained when the same nacelle is installed in a generic configuration, with and without a wing.
A review on battery thermal management in electric vehicle application
NASA Astrophysics Data System (ADS)
Xia, Guodong; Cao, Lei; Bi, Guanglong
2017-11-01
The global issues of energy crisis and air pollution have offered a great opportunity to develop electric vehicles. However, so far, cycle life of power battery, environment adaptability, driving range and charging time seems far to compare with the level of traditional vehicles with internal combustion engine. Effective battery thermal management (BTM) is absolutely essential to relieve this situation. This paper reviews the existing literature from two levels that are cell level and battery module level. For single battery, specific attention is paid to three important processes which are heat generation, heat transport, and heat dissipation. For large format cell, multi-scale multi-dimensional coupled models have been developed. This will facilitate the investigation on factors, such as local irreversible heat generation, thermal resistance, current distribution, etc., that account for intrinsic temperature gradients existing in cell. For battery module based on air and liquid cooling, series, series-parallel and parallel cooling configurations are discussed. Liquid cooling strategies, especially direct liquid cooling strategies, are reviewed and they may advance the battery thermal management system to a new generation.
Numerical weather prediction in low latitudes
NASA Technical Reports Server (NTRS)
Krishnamurti, T. N.
1985-01-01
Based on the results of a number of numerical prediction experiments, the differential heating between land and ocean is an important and critical factor for investigation of phenomenon such as the onset of monsoons over the Indian subcontinent. The pre-onset period during the month of May shows a rather persistent flow field in the monsoon region. At low levels the circulation exhibits anticyclonic excursions over the Arabian Sea, flowing essentially parallel to the west coast of India from the north. Over the Indian subcontinent the major feature is a shallow heat low over northern India. As the heat sources commence a rapid northwestward movement toward the southern edge of the Tibetan Plateau, an interesting configuration of the large-scale divergent circulation occurs. A favorable configuration for a rapid exchange of energy from the divergent to the rotational kinetic energy develops. Strong low level monsoonal circulations evolve, attendant with that the onset of monsoon rains occurs. In order to test this observational sequence, a series of short-range numerical prediction experiments were initiated to define the initial heat sources.
Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R
2017-01-21
The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
NASA Astrophysics Data System (ADS)
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.
2018-03-01
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.
NASA Astrophysics Data System (ADS)
Kenway, Gaetan K. W.
This thesis presents new tools and techniques developed to address the challenging problem of high-fidelity aerostructural optimization with respect to large numbers of design variables. A new mesh-movement scheme is developed that is both computationally efficient and sufficiently robust to accommodate large geometric design changes and aerostructural deformations. A fully coupled Newton-Krylov method is presented that accelerates the convergence of aerostructural systems and provides a 20% performance improvement over the traditional nonlinear block Gauss-Seidel approach and can handle more exible structures. A coupled adjoint method is used that efficiently computes derivatives for a gradient-based optimization algorithm. The implementation uses only machine accurate derivative techniques and is verified to yield fully consistent derivatives by comparing against the complex step method. The fully-coupled large-scale coupled adjoint solution method is shown to have 30% better performance than the segregated approach. The parallel scalability of the coupled adjoint technique is demonstrated on an Euler Computational Fluid Dynamics (CFD) model with more than 80 million state variables coupled to a detailed structural finite-element model of the wing with more than 1 million degrees of freedom. Multi-point high-fidelity aerostructural optimizations of a long-range wide-body, transonic transport aircraft configuration are performed using the developed techniques. The aerostructural analysis employs Euler CFD with a 2 million cell mesh and a structural finite element model with 300 000 DOF. Two design optimization problems are solved: one where takeoff gross weight is minimized, and another where fuel burn is minimized. Each optimization uses a multi-point formulation with 5 cruise conditions and 2 maneuver conditions. The optimization problems have 476 design variables are optimal results are obtained within 36 hours of wall time using 435 processors. The TOGW minimization results in a 4.2% reduction in TOGW with a 6.6% fuel burn reduction, while the fuel burn optimization resulted in a 11.2% fuel burn reduction with no change to the takeoff gross weight.
Exergy analysis of large-scale helium liquefiers: Evaluating design trade-offs
NASA Astrophysics Data System (ADS)
Thomas, Rijo Jacob; Ghosh, Parthasarathi; Chowdhury, Kanchan
2014-01-01
It is known that higher heat exchanger area, more number of expanders with higher efficiency and more involved configuration with multi-pressure compression system increase the plant efficiency of a helium liquefier. However, they involve higher capital investment and larger size. Using simulation software Aspen Hysys v 7.0 and exergy analysis as the tool of analysis, authors have attempted to identify various trade-offs while selecting the number of stages, the pressure levels in compressor, the cold-end configuration, the heat exchanger surface area, the maximum allowable pressure drop in heat exchangers, the efficiency of expanders, the parallel/series connection of expanders etc. Use of more efficient cold ends reduces the number of refrigeration stages and the size of the plant. For achieving reliability along with performance, a configuration with a combination of expander and Joule-Thomson valve is found to be a better choice for cold end. Use of multi-pressure system is relevant only when the number of refrigeration stages is more than 5. Arrangement of expanders in series reduces the number of expanders as well as the heat exchanger size with slight expense of plant efficiency. Superior heat exchanger (having less pressure drop per unit heat transfer area) results in only 5% increase of plant performance even when it has 100% higher heat exchanger surface area.
NASA Astrophysics Data System (ADS)
Pascoe, Stephen; Iwi, Alan; kershaw, philip; Stephens, Ag; Lawrence, Bryan
2014-05-01
The advent of large-scale data and the consequential analysis problems have led to two new challenges for the research community: how to share such data to get the maximum value and how to carry out efficient analysis. Solving both challenges require a form of parallelisation: the first is social parallelisation (involving trust and information sharing), the second data parallelisation (involving new algorithms and tools). The JASMIN infrastructure supports both kinds of parallelism by providing a multi-tennent environment with petabyte-scale storage, VM provisioning and batch cluster facilities. The JASMIN Analysis Platform (JAP) is an analysis software layer for JASMIN which emphasises ease of transition from a researcher's local environment to JASMIN. JAP brings together tools traditionally used by multiple communities and configures them to work together, enabling users to move analysis from their local environment to JASMIN without rewriting code. JAP also provides facilities to exploit JASMIN's parallel capabilities whilst maintaining their familiar analysis environment where ever possible. Modern opensource analysis tools typically have multiple dependent packages, increasing the installation burden on system administrators. When you consider a suite of tools, often with both common and conflicting dependencies, analysis pipelines can become locked to a particular installation simply because of the effort required to reconstruct the dependency tree. JAP addresses this problem by providing a consistent suite of RPMs compatible with RedHat Enterprise Linux and CentOS 6.4. Researchers can install JAP locally, either as RPMs or through a pre-built VM image, giving them the confidence to know moving analysis to JASMIN will not disrupt their environment. Analysis parallelisation is in it's infancy in climate sciences, with few tools capable of exploiting any parallel environment beyond manual scripting of the use of multiple processors. JAP begins to bridge this gap through a veriety of higher-level tools for parallelisation and job scheduling such as IPython-parallel and MPI support for interactive analysis languages. We find that enabling even simple parallelisation of workflows, together with the state of the art I/O performance of JASMIN storage, provides many users with the large increases in efficiency they need to scale their analyses to conteporary data volumes and tackly new, previously inaccessible, problems.
NASA Technical Reports Server (NTRS)
Schlundt, D. W.
1976-01-01
The installed performance degradation of a swivel nozzle thrust deflector system obtained during increased vectoring angles of a large-scale test program was investigated and improved. Small-scale models were used to generate performance data for analyzing selected swivel nozzle configurations. A single-swivel nozzle design model with five different nozzle configurations and a twin-swivel nozzle design model, scaled to 0.15 size of the large-scale test hardware, were statically tested at low exhaust pressure ratios of 1.4, 1.3, 1.2, and 1.1 and vectored at four nozzle positions from 0 deg cruise through 90 deg vertical used for the VTOL mode.
Integration experiences and performance studies of A COTS parallel archive systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Bary
2010-01-01
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Gary
2010-06-16
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less
Global Swath and Gridded Data Tiling
NASA Technical Reports Server (NTRS)
Thompson, Charles K.
2012-01-01
This software generates cylindrically projected tiles of swath-based or gridded satellite data for the purpose of dynamically generating high-resolution global images covering various time periods, scaling ranges, and colors called "tiles." It reconstructs a global image given a set of tiles covering a particular time range, scaling values, and a color table. The program is configurable in terms of tile size, spatial resolution, format of input data, location of input data (local or distributed), number of processes run in parallel, and data conditioning.
Doll, J.; Dupuis, P.; Nyquist, P.
2017-02-08
Parallel tempering, or replica exchange, is a popular method for simulating complex systems. The idea is to run parallel simulations at different temperatures, and at a given swap rate exchange configurations between the parallel simulations. From the perspective of large deviations it is optimal to let the swap rate tend to infinity and it is possible to construct a corresponding simulation scheme, known as infinite swapping. In this paper we propose a novel use of large deviations for empirical measures for a more detailed analysis of the infinite swapping limit in the setting of continuous time jump Markov processes. Usingmore » the large deviations rate function and associated stochastic control problems we consider a diagnostic based on temperature assignments, which can be easily computed during a simulation. We show that the convergence of this diagnostic to its a priori known limit is a necessary condition for the convergence of infinite swapping. The rate function is also used to investigate the impact of asymmetries in the underlying potential landscape, and where in the state space poor sampling is most likely to occur.« less
Scale dependence of the alignment between strain rate and rotation in turbulent shear flow
NASA Astrophysics Data System (ADS)
Fiscaletti, D.; Elsinga, G. E.; Attili, A.; Bisetti, F.; Buxton, O. R. H.
2016-10-01
The scale dependence of the statistical alignment tendencies of the eigenvectors of the strain-rate tensor ei, with the vorticity vector ω , is examined in the self-preserving region of a planar turbulent mixing layer. Data from a direct numerical simulation are filtered at various length scales and the probability density functions of the magnitude of the alignment cosines between the two unit vectors | ei.ω ̂| are examined. It is observed that the alignment tendencies are insensitive to the concurrent large-scale velocity fluctuations, but are quantitatively affected by the nature of the concurrent large-scale velocity-gradient fluctuations. It is confirmed that the small-scale (local) vorticity vector is preferentially aligned in parallel with the large-scale (background) extensive strain-rate eigenvector e1, in contrast to the global tendency for ω to be aligned in parallel with the intermediate strain-rate eigenvector [Hamlington et al., Phys. Fluids 20, 111703 (2008), 10.1063/1.3021055]. When only data from regions of the flow that exhibit strong swirling are included, the so-called high-enstrophy worms, the alignment tendencies are exaggerated with respect to the global picture. These findings support the notion that the production of enstrophy, responsible for a net cascade of turbulent kinetic energy from large scales to small scales, is driven by vorticity stretching due to the preferential parallel alignment between ω and nonlocal e1 and that the strongly swirling worms are kinematically significant to this process.
Linear static structural and vibration analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.
1993-01-01
Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
NASA Technical Reports Server (NTRS)
Shivers, J. P.; Mclemore, H. C.; Coe, P. L., Jr.
1976-01-01
Tests have been conducted in a full scale tunnel to determine the low speed aerodynamic characteristics of a large scale advanced arrow wing supersonic transport configuration with engines mounted above the wing for upper surface blowing. Tests were made over an angle of attack range of -10 deg to 32 deg, sideslip angles of + or - 5 deg, and a Reynolds number range of 3,530,000 to 7,330,000. Configuration variables included trailing edge flap deflection, engine jet nozzle angle, engine thrust coefficient, engine out operation, and asymmetrical trailing edge boundary layer control for providing roll trim. Downwash measurements at the tail were obtained for different thrust coefficients, tail heights, and at two fuselage stations.
NASA Astrophysics Data System (ADS)
Separovic, Leo; Husain, Syed Zahid; Yu, Wei
2015-09-01
Internal variability (IV) in dynamical downscaling with limited-area models (LAMs) represents a source of error inherent to the downscaled fields, which originates from the sensitive dependence of the models to arbitrarily small modifications. If IV is large it may impose the need for probabilistic verification of the downscaled information. Atmospheric spectral nudging (ASN) can reduce IV in LAMs as it constrains the large-scale components of LAM fields in the interior of the computational domain and thus prevents any considerable penetration of sensitively dependent deviations into the range of large scales. Using initial condition ensembles, the present study quantifies the impact of ASN on IV in LAM simulations in the range of fine scales that are not controlled by spectral nudging. Four simulation configurations that all include strong ASN but differ in the nudging settings are considered. In the fifth configuration, grid nudging of land surface variables toward high-resolution surface analyses is applied. The results show that the IV at scales larger than 300 km can be suppressed by selecting an appropriate ASN setup. At scales between 300 and 30 km, however, in all configurations, the hourly near-surface temperature, humidity, and winds are only partly reproducible. Nudging the land surface variables is found to have the potential to significantly reduce IV, particularly for fine-scale temperature and humidity. On the other hand, hourly precipitation accumulations at these scales are generally irreproducible in all configurations, and probabilistic approach to downscaling is therefore recommended.
Displacement and deformation measurement for large structures by camera network
NASA Astrophysics Data System (ADS)
Shang, Yang; Yu, Qifeng; Yang, Zhen; Xu, Zhiqiang; Zhang, Xiaohu
2014-03-01
A displacement and deformation measurement method for large structures by a series-parallel connection camera network is presented. By taking the dynamic monitoring of a large-scale crane in lifting operation as an example, a series-parallel connection camera network is designed, and the displacement and deformation measurement method by using this series-parallel connection camera network is studied. The movement range of the crane body is small, and that of the crane arm is large. The displacement of the crane body, the displacement of the crane arm relative to the body and the deformation of the arm are measured. Compared with a pure series or parallel connection camera network, the designed series-parallel connection camera network can be used to measure not only the movement and displacement of a large structure but also the relative movement and deformation of some interesting parts of the large structure by a relatively simple optical measurement system.
Real-time object tracking based on scale-invariant features employing bio-inspired hardware.
Yasukawa, Shinsuke; Okuno, Hirotsugu; Ishii, Kazuo; Yagi, Tetsuya
2016-09-01
We developed a vision sensor system that performs a scale-invariant feature transform (SIFT) in real time. To apply the SIFT algorithm efficiently, we focus on a two-fold process performed by the visual system: whole-image parallel filtering and frequency-band parallel processing. The vision sensor system comprises an active pixel sensor, a metal-oxide semiconductor (MOS)-based resistive network, a field-programmable gate array (FPGA), and a digital computer. We employed the MOS-based resistive network for instantaneous spatial filtering and a configurable filter size. The FPGA is used to pipeline process the frequency-band signals. The proposed system was evaluated by tracking the feature points detected on an object in a video. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Watson, Willie R. (Technical Monitor)
2005-01-01
The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Parallelization of MRCI based on hole-particle symmetry.
Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin
2005-01-15
The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures
NASA Astrophysics Data System (ADS)
Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi
2017-04-01
Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.
1990-01-01
While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
Comparison of Parallel and Series Hybrid Power Trains for Transit Bus Applications
Gao, Zhiming; Daw, C. Stuart; Smith, David E.; ...
2016-08-01
The fuel economy and emissions of conventional and hybrid buses equipped with emissions after treatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicated that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar carbon monoxide and hydrocarbon tailpipe emissions but were also predicted to have reduced tailpipe emissions of nitrogen oxides compared with the conventional bus in higher speed cycles. For the New York bus cycle, which hasmore » the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus; the parallel hybrid bus had significantly lower tailpipe emissions. All three bus power trains were found to require periodic active diesel particulate filter regeneration to maintain control of particulate matter. Finally, plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed because of the relatively large battery capacity that is typical of the series hybrid configuration.« less
Comparison of Parallel and Series Hybrid Power Trains for Transit Bus Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Zhiming; Daw, C. Stuart; Smith, David E.
The fuel economy and emissions of conventional and hybrid buses equipped with emissions after treatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicated that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar carbon monoxide and hydrocarbon tailpipe emissions but were also predicted to have reduced tailpipe emissions of nitrogen oxides compared with the conventional bus in higher speed cycles. For the New York bus cycle, which hasmore » the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus; the parallel hybrid bus had significantly lower tailpipe emissions. All three bus power trains were found to require periodic active diesel particulate filter regeneration to maintain control of particulate matter. Finally, plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed because of the relatively large battery capacity that is typical of the series hybrid configuration.« less
Research on precision grinding technology of large scale and ultra thin optics
NASA Astrophysics Data System (ADS)
Zhou, Lian; Wei, Qiancai; Li, Jie; Chen, Xianhua; Zhang, Qinghua
2018-03-01
The flatness and parallelism error of large scale and ultra thin optics have an important influence on the subsequent polishing efficiency and accuracy. In order to realize the high precision grinding of those ductile elements, the low deformation vacuum chuck was designed first, which was used for clamping the optics with high supporting rigidity in the full aperture. Then the optics was planar grinded under vacuum adsorption. After machining, the vacuum system was turned off. The form error of optics was on-machine measured using displacement sensor after elastic restitution. The flatness would be convergenced with high accuracy by compensation machining, whose trajectories were integrated with the measurement result. For purpose of getting high parallelism, the optics was turned over and compensation grinded using the form error of vacuum chuck. Finally, the grinding experiment of large scale and ultra thin fused silica optics with aperture of 430mm×430mm×10mm was performed. The best P-V flatness of optics was below 3 μm, and parallelism was below 3 ″. This machining technique has applied in batch grinding of large scale and ultra thin optics.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.
Slażyński, Leszek; Bohte, Sander
2012-01-01
The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software
NASA Astrophysics Data System (ADS)
Ismail-Beigi, Sohrab
The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method formore » $$\\mathscr{O}(N)$$ Kohn–Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw–Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw–Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. Here, we further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect $$\\mathscr{O}(N)$$ scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.« less
Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; ...
2017-12-07
We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method formore » $$\\mathscr{O}(N)$$ Kohn–Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw–Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw–Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. Here, we further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect $$\\mathscr{O}(N)$$ scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.« less
NASA Technical Reports Server (NTRS)
1983-01-01
The Flat Plate Solar Array Project, focuses on advancing technologies relevant to the design and construction of megawatt level central station systems. Photovoltaic modules and arrays for flat plate central station or other large scale electric power production facilities require the establishment of a technical base that resolves design issues and results in practical and cost effective configurations. Design, qualification and maintenance issues related to central station arrays derived from the engineering and operating experiences of early applications and parallel laboratory reserch activities are investigated. Technical issues are examined from the viewpoint of the utility engineer, architect/engineer and laboratory researcher. Topics on optimum source circuit designs, module insulation design for high system voltages, array safety, structural interface design, measurements, and array operation and maintenance are discussed.
Aircraft Engine Noise Scattering by Fuselage and Wings: A Computational Approach
NASA Technical Reports Server (NTRS)
Farassat, F.; Stanescu, D.; Hussaini, M. Y.
2003-01-01
The paper presents a time-domain method for computation of sound radiation from aircraft engine sources to the far field. The effects of non-uniform flow around the aircraft and scattering of sound by fuselage and wings are accounted for in the formulation. The approach is based on the discretization of the inviscid flow equations through a collocation form of the discontinuous Galerkin spectral element method. An isoparametric representation of the underlying geometry is used in order to take full advantage of the spectral accuracy of the method. Large-scale computations are made possible by a parallel implementation based on message passing. Results obtained for radiation from an axisymmetric nacelle alone are compared with those obtained when the same nacelle is installed in a generic configuration, with and without a wing. 0 2002 Elsevier Science Ltd. All rights reserved.
Grid-Enabled Quantitative Analysis of Breast Cancer
2009-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...pilot study to utilize large scale parallel Grid computing to harness the nationwide cluster infrastructure for optimization of medical image ... analysis parameters. Additionally, we investigated the use of cutting edge dataanalysis/ mining techniques as applied to Ultrasound, FFDM, and DCE-MRI Breast
Large boron--epoxy filament-wound pressure vessels
NASA Technical Reports Server (NTRS)
Jensen, W. M.; Bailey, R. L.; Knoell, A. C.
1973-01-01
Advanced composite material used to fabricate pressure vessel is prepeg (partially cured) consisting of continuous, parallel boron filaments in epoxy resin matrix arranged to form tape. To fabricate chamber, tape is wound on form which must be removable after composite has been cured. Configuration of boron--epoxy composite pressure vessel was determined by computer program.
Bale, S D; Mozer, F S
2007-05-18
Large parallel (
Juliano, Pablo; Temmel, Sandra; Rout, Manoj; Swiergon, Piotr; Mawson, Raymond; Knoerzer, Kai
2013-01-01
Recent research has shown that high frequency ultrasound (0.4-3 MHz), can enhance milkfat separation in small scale systems able to treat only a few milliliters of sample. In this work, the effect of ultrasonic standing waves on milkfat creaming was studied in a 6L reactor and the influence of different frequencies and transducer configurations in direct contact with the fluid was investigated. A recombined coarse milk emulsion with fat globules stained with oil-red-O dye was selected for the separation trials. Runs were performed with one or two transducers placed in vertical (parallel or perpendicular) and horizontal positions (at the reactor base) at 0.4, 1 and/or 2 MHz (specific energy 8.5 ± 0.6 kJ/kg per transducer). Creaming behavior was assessed by measuring the thickness of the separated cream layer. Other methods supporting this assessment included the measurement of fat content, backscattering, particle size distribution, and microscopy of samples taken at the bottom and top of the reactor. Most efficient creaming was found after treatment at 0.4 MHz in single and double vertical transducer configurations. Among these configurations, a higher separation rate was obtained when sonicating at 0.4 MHz in a vertical perpendicular double transducer setup. The horizontal transducer configuration promoted creaming at 2 MHz only. Fat globule size increase was observed when creaming occurred. This research highlights the potential for enhanced separation of milkfat in larger scale systems from selected transducer configurations in contact with a dairy emulsion, or emulsion splitting in general. Copyright © 2012 Elsevier B.V. All rights reserved.
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Spectral enstrophy budget in a shear-less flow with turbulent/non-turbulent interface
NASA Astrophysics Data System (ADS)
Cimarelli, Andrea; Cocconi, Giacomo; Frohnapfel, Bettina; De Angelis, Elisabetta
2015-12-01
A numerical analysis of the interaction between decaying shear free turbulence and quiescent fluid is performed by means of global statistical budgets of enstrophy, both, at the single-point and two point levels. The single-point enstrophy budget allows us to recognize three physically relevant layers: a bulk turbulent region, an inhomogeneous turbulent layer, and an interfacial layer. Within these layers, enstrophy is produced, transferred, and finally destroyed while leading to a propagation of the turbulent front. These processes do not only depend on the position in the flow field but are also strongly scale dependent. In order to tackle this multi-dimensional behaviour of enstrophy in the space of scales and in physical space, we analyse the spectral enstrophy budget equation. The picture consists of an inviscid spatial cascade of enstrophy from large to small scales parallel to the interface moving towards the interface. At the interface, this phenomenon breaks, leaving place to an anisotropic cascade where large scale structures exhibit only a cascade process normal to the interface thus reducing their thickness while retaining their lengths parallel to the interface. The observed behaviour could be relevant for both the theoretical and the modelling approaches to flow with interacting turbulent/nonturbulent regions. The scale properties of the turbulent propagation mechanisms highlight that the inviscid turbulent transport is a large-scale phenomenon. On the contrary, the viscous diffusion, commonly associated with small scale mechanisms, highlights a much richer physics involving small lengths, normal to the interface, but at the same time large scales, parallel to the interface.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques
DOT National Transportation Integrated Search
1995-01-01
Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...
NASA Astrophysics Data System (ADS)
Denneulin, T.; Wollschläger, N.; Everhardt, A. S.; Farokhipoor, S.; Noheda, B.; Snoeck, E.; Hÿtch, M.
2018-05-01
Lead zirconate titanate samples are used for their piezoelectric and ferroelectric properties in various types of micro-devices. Epitaxial layers of tetragonal perovskites have a tendency to relax by forming ferroelastic domains. The accommodation of the a/c/a/c polydomain structure on a flat substrate leads to nanoscale deformation gradients which locally influence the polarization by flexoelectric effect. Here, we investigated the deformation fields in epitaxial layers of Pb(Zr0.2Ti0.8)O3 grown on SrTiO3 substrates using transmission electron microscopy (TEM). We found that the deformation gradients depend on the domain walls inclination ( or to the substrate interface) of the successive domains and we describe three different a/c/a domain configurations: one configuration with parallel a-domains and two configurations with perpendicular a-domains (V-shaped and hat--shaped). In the parallel configuration, the c-domains contain horizontal and vertical gradients of out-of-plane deformation. In the V-shaped and hat--shaped configurations, the c-domains exhibit a bending deformation field with vertical gradients of in-plane deformation. Each of these configurations is expected to have a different influence on the polarization and so the local properties of the film. The deformation gradients were measured using dark-field electron holography, a TEM technique, which offers a good sensitivity (0.1%) and a large field-of-view (hundreds of nanometers). The measurements are compared with finite element simulations.
Approximate kernel competitive learning.
Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang
2015-03-01
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Anisotropic magnetotail equilibrium and convection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hau, L.N.
This paper reports on self-consistent two-dimensional equilibria with anisotropic plasma pressure for the Earth's magnetotail. These configurations are obtained by numerically solving the generalized Grad-Shafranov equation, describing anisotropic plasmas with p[parallel] [ne] p[perpendicular], including the Earth's dipolar field. Consistency between these new equilibria and the assumption of steady-state, sunward convection, described by the double-adiabatic laws, is examined. As for the case of isotropic pressure [Erickson and Wolf, 1980], there exists a discrepancy between typical quite-time magnetic field models and the assumption of steady-state double-adiabatic lossless plasma sheet convection. However, unlike that case, this inconsistency cannot be removed by the presencemore » of a weak equatorial normal magnetic field strength in the near tail region: magnetic field configurations of this type produce unreasonably large pressure anisotropies, p[parallel] > p[perpendicular], in the plasma sheet. 16 refs., 5 figs.« less
New sample cell configuration for wide-frequency dielectric spectroscopy: DC to radio frequencies.
Nakanishi, Masahiro; Sasaki, Yasutaka; Nozaki, Ryusuke
2010-12-01
A new configuration for the sample cell to be used in broadband dielectric spectroscopy is presented. A coaxial structure with a parallel plate capacitor (outward parallel plate cell: OPPC) has made it possible to extend the frequency range significantly in comparison with the frequency range of the conventional configuration. In the proposed configuration, stray inductance is significantly decreased; consequently, the upper bound of the frequency range is improved by two orders of magnitude from the upper limit of conventional parallel plate capacitor (1 MHz). Furthermore, the value of capacitance is kept high by using a parallel plate configuration. Therefore, the precision of the capacitance measurement in the lower frequency range remains sufficiently high. Finally, OPPC can cover a wide frequency range (100 Hz-1 GHz) with an appropriate admittance measuring apparatus such as an impedance or network analyzer. The OPPC and the conventional dielectric cell are compared by examining the frequency dependence of the complex permittivity for several polar liquids and polymeric films.
Quantum transport modelling of silicon nanobeams using heterogeneous computing scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harb, M., E-mail: harbm@physics.mcgill.ca; Michaud-Rioux, V., E-mail: vincentm@physics.mcgill.ca; Guo, H., E-mail: guo@physics.mcgill.ca
We report the development of a powerful method for quantum transport calculations of nanowire/nanobeam structures with large cross sectional area. Our approach to quantum transport is based on Green's functions and tight-binding potentials. A linear algebraic formulation allows us to harness the massively parallel nature of Graphics Processing Units (GPUs) and our implementation is based on a heterogeneous parallel computing scheme with traditional processors and GPUs working together. Using our software tool, the electronic and quantum transport properties of silicon nanobeams with a realistic cross sectional area of ∼22.7 nm{sup 2} and a length of ∼81.5 nm—comprising 105 000 Si atoms and 24 000more » passivating H atoms in the scattering region—are investigated. The method also allows us to perform significant averaging over impurity configurations—all possible configurations were considered in the case of single impurities. Finally, the effect of the position and number of vacancy defects on the transport properties was considered. It is found that the configurations with the vacancies lying closer to the local density of states (LDOS) maxima have lower transmission functions than the configurations with the vacancies located at LDOS minima or far away from LDOS maxima, suggesting both a qualitative method to tune or estimate optimal impurity configurations as well as a physical picture that accounts for device variability. Finally, we provide performance benchmarks for structures as large as ∼42.5 nm{sup 2} cross section and ∼81.5 nm length.« less
Calculating Potential Energy Curves with Quantum Monte Carlo
NASA Astrophysics Data System (ADS)
Powell, Andrew D.; Dawes, Richard
2014-06-01
Quantum Monte Carlo (QMC) is a computational technique that can be applied to the electronic Schrödinger equation for molecules. QMC methods such as Variational Monte Carlo (VMC) and Diffusion Monte Carlo (DMC) have demonstrated the capability of capturing large fractions of the correlation energy, thus suggesting their possible use for high-accuracy quantum chemistry calculations. QMC methods scale particularly well with respect to parallelization making them an attractive consideration in anticipation of next-generation computing architectures which will involve massive parallelization with millions of cores. Due to the statistical nature of the approach, in contrast to standard quantum chemistry methods, uncertainties (error-bars) are associated with each calculated energy. This study focuses on the cost, feasibility and practical application of calculating potential energy curves for small molecules with QMC methods. Trial wave functions were constructed with the multi-configurational self-consistent field (MCSCF) method from GAMESS-US.[1] The CASINO Monte Carlo quantum chemistry package [2] was used for all of the DMC calculations. An overview of our progress in this direction will be given. References: M. W. Schmidt et al. J. Comput. Chem. 14, 1347 (1993). R. J. Needs et al. J. Phys.: Condensed Matter 22, 023201 (2010).
Simulation of 2D Kinetic Effects in Plasmas using the Grid Based Continuum Code LOKI
NASA Astrophysics Data System (ADS)
Banks, Jeffrey; Berger, Richard; Chapman, Tom; Brunner, Stephan
2016-10-01
Kinetic simulation of multi-dimensional plasma waves through direct discretization of the Vlasov equation is a useful tool to study many physical interactions and is particularly attractive for situations where minimal fluctuation levels are desired, for instance, when measuring growth rates of plasma wave instabilities. However, direct discretization of phase space can be computationally expensive, and as a result there are few examples of published results using Vlasov codes in more than a single configuration space dimension. In an effort to fill this gap we have developed the Eulerian-based kinetic code LOKI that evolves the Vlasov-Poisson system in 2+2-dimensional phase space. The code is designed to reduce the cost of phase-space computation by using fully 4th order accurate conservative finite differencing, while retaining excellent parallel scalability that efficiently uses large scale computing resources. In this poster I will discuss the algorithms used in the code as well as some aspects of their parallel implementation using MPI. I will also overview simulation results of basic plasma wave instabilities relevant to laser plasma interaction, which have been obtained using the code.
Parallel magnetic field suppresses dissipation in superconducting nanostrips
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yong-Lei; Glatz, Andreas; Kimmel, Gregory J.
The motion of Abrikosov vortices in type-II superconductors results in a finite resistance in the presence of an applied electric current. Elimination or reduction of the resistance via immobilization of vortices is the "holy grail" of superconductivity research. Common wisdom dictates that an increase in the magnetic field escalates the loss of energy since the number of vortices increases. Here we show that this is no longer true if the magnetic field and the current are applied parallel to each other. Our experimental studies on the resistive behavior of a superconducting Mo0.79Ge0.21 nanostrip reveal the emergence of a dissipative statemore » with increasing magnetic field, followed by a pronounced resistance drop, signifying a reentrance to the superconducting state. Large-scale simulations of the 3D time-dependent Ginzburg-Landau model indicate that the intermediate resistive state is due to an unwinding of twisted vortices. When the magnetic field increases, this instability is suppressed due to a better accommodation of the vortex lattice to the pinning configuration. Our findings show that magnetic field and geometrical confinement can suppress the dissipation induced by vortex motion and thus radically improve the performance of superconducting materials.« less
Prediction of rarefied micro-nozzle flows using the SPARTA library
NASA Astrophysics Data System (ADS)
Deschenes, Timothy R.; Grot, Jonathan
2016-11-01
The accurate numerical prediction of gas flows within micro-nozzles can help evaluate the performance and enable the design of optimal configurations for micro-propulsion systems. Viscous effects within the large boundary layers can have a strong impact on the nozzle performance. Furthermore, the variation in collision length scales from continuum to rarefied preclude the use of continuum-based computational fluid dynamics. In this paper, we describe the application of a massively parallel direct simulation Monte Carlo (DSMC) library to predict the steady-state and transient flow through a micro-nozzle. The nozzle's geometric configuration is described in a highly flexible manner to allow for the modification of the geometry in a systematic fashion. The transient simulation highlights a strong shock structure that forms within the converging portion of the nozzle when the expanded gas interacts with the nozzle walls. This structure has a strong impact on the buildup of the gas in the nozzle and affects the boundary layer thickness beyond the throat in the diverging section of the nozzle. Future work will look to examine the transient thrust and integrate this simulation capability into a web-based rarefied gas dynamics prediction software, which is currently under development.
Large Scale Software Building with CMake in ATLAS
NASA Astrophysics Data System (ADS)
Elmsheuser, J.; Krasznahorkay, A.; Obreshkov, E.; Undrus, A.; ATLAS Collaboration
2017-10-01
The offline software of the ATLAS experiment at the Large Hadron Collider (LHC) serves as the platform for detector data reconstruction, simulation and analysis. It is also used in the detector’s trigger system to select LHC collision events during data taking. The ATLAS offline software consists of several million lines of C++ and Python code organized in a modular design of more than 2000 specialized packages. Because of different workflows, many stable numbered releases are in parallel production use. To accommodate specific workflow requests, software patches with modified libraries are distributed on top of existing software releases on a daily basis. The different ATLAS software applications also require a flexible build system that strongly supports unit and integration tests. Within the last year this build system was migrated to CMake. A CMake configuration has been developed that allows one to easily set up and build the above mentioned software packages. This also makes it possible to develop and test new and modified packages on top of existing releases. The system also allows one to detect and execute partial rebuilds of the release based on single package changes. The build system makes use of CPack for building RPM packages out of the software releases, and CTest for running unit and integration tests. We report on the migration and integration of the ATLAS software to CMake and show working examples of this large scale project in production.
Parallel Visualization of Large-Scale Aerodynamics Calculations: A Case Study on the Cray T3E
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Crockett, Thomas W.
1999-01-01
This paper reports the performance of a parallel volume rendering algorithm for visualizing a large-scale, unstructured-grid dataset produced by a three-dimensional aerodynamics simulation. This dataset, containing over 18 million tetrahedra, allows us to extend our performance results to a problem which is more than 30 times larger than the one we examined previously. This high resolution dataset also allows us to see fine, three-dimensional features in the flow field. All our tests were performed on the Silicon Graphics Inc. (SGI)/Cray T3E operated by NASA's Goddard Space Flight Center. Using 511 processors, a rendering rate of almost 9 million tetrahedra/second was achieved with a parallel overhead of 26%.
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
NASA Astrophysics Data System (ADS)
Balaji, V.; Benson, Rusty; Wyman, Bruce; Held, Isaac
2016-10-01
Climate models represent a large variety of processes on a variety of timescales and space scales, a canonical example of multi-physics multi-scale modeling. Current hardware trends, such as Graphical Processing Units (GPUs) and Many Integrated Core (MIC) chips, are based on, at best, marginal increases in clock speed, coupled with vast increases in concurrency, particularly at the fine grain. Multi-physics codes face particular challenges in achieving fine-grained concurrency, as different physics and dynamics components have different computational profiles, and universal solutions are hard to come by. We propose here one approach for multi-physics codes. These codes are typically structured as components interacting via software frameworks. The component structure of a typical Earth system model consists of a hierarchical and recursive tree of components, each representing a different climate process or dynamical system. This recursive structure generally encompasses a modest level of concurrency at the highest level (e.g., atmosphere and ocean on different processor sets) with serial organization underneath. We propose to extend concurrency much further by running more and more lower- and higher-level components in parallel with each other. Each component can further be parallelized on the fine grain, potentially offering a major increase in the scalability of Earth system models. We present here first results from this approach, called coarse-grained component concurrency, or CCC. Within the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (FMS), the atmospheric radiative transfer component has been configured to run in parallel with a composite component consisting of every other atmospheric component, including the atmospheric dynamics and all other atmospheric physics components. We will explore the algorithmic challenges involved in such an approach, and present results from such simulations. Plans to achieve even greater levels of coarse-grained concurrency by extending this approach within other components, such as the ocean, will be discussed.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project
NASA Technical Reports Server (NTRS)
Woo, Alex C.; Hill, Kueichien C.
1996-01-01
The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Jeon, Eun-Ki; Jung, Ji-Min; Ryu, So-Ri; Baek, Kitae
2015-10-01
The applicability of an in situ electrokinetic process with a parallel electrode configuration was evaluated to treat an As-, Cu-, and Pb-contaminated paddy rice field in full scale (width, 17 m; length, 12.2 m; depth, 1.6 m). A constant voltage of 100 V was supplied and electrodes were spaced 2 m apart. Most As, Cu, and Pb were bound to Fe oxide and the major clay minerals in the test site were kaolinite and muscovite. The electrokinetic system removed 48.7, 48.9, and 54.5 % of As, Cu, and Pb, respectively, from the soil during 24 weeks. The removal of metals in the first layer (0-0.4 m) was higher than that in the other three layers because it was not influenced by groundwater fluctuation. Fractionation analysis showed that As and Pb bound to amorphous Fe and Al oxides decreased mainly, and energy consumption was 1.2 kWh/m(3). The standard deviation of metal concentration in the soil was much higher compared to the hexagonal electrode configuration because of a smaller electrical active area; however, the electrode configuration removed similar amounts of metals compared to the hexagonal system. From these results, it was concluded that the electrokinetic process could be effective at remediating As-, Cu-, and Pb-contaminated paddy rice field in situ.
Efficiency and flexibility using implicit methods within atmosphere dycores
NASA Astrophysics Data System (ADS)
Evans, K. J.; Archibald, R.; Norman, M. R.; Gardner, D. J.; Woodward, C. S.; Worley, P.; Taylor, M.
2016-12-01
A suite of explicit and implicit methods are evaluated for a range of configurations of the shallow water dynamical core within the spectral-element Community Atmosphere Model (CAM-SE) to explore their relative computational performance. The configurations are designed to explore the attributes of each method under different but relevant model usage scenarios including varied spectral order within an element, static regional refinement, and scaling to large problem sizes. The limitations and benefits of using explicit versus implicit, with different discretizations and parameters, are discussed in light of trade-offs such as MPI communication, memory, and inherent efficiency bottlenecks. For the regionally refined shallow water configurations, the implicit BDF2 method is about the same efficiency as an explicit Runge-Kutta method, without including a preconditioner. Performance of the implicit methods with the residual function executed on a GPU is also presented; there is speed up for the residual relative to a CPU, but overwhelming transfer costs motivate moving more of the solver to the device. Given the performance behavior of implicit methods within the shallow water dynamical core, the recommendation for future work using implicit solvers is conditional based on scale separation and the stiffness of the problem. The strong growth of linear iterations with increasing resolution or time step size is the main bottleneck to computational efficiency. Within the hydrostatic dynamical core, of CAM-SE, we present results utilizing approximate block factorization preconditioners implemented using the Trilinos library of solvers. They reduce the cost of linear system solves and improve parallel scalability. We provide a summary of the remaining efficiency considerations within the preconditioner and utilization of the GPU, as well as a discussion about the benefits of a time stepping method that provides converged and stable solutions for a much wider range of time step sizes. As more complex model components, for example new physics and aerosols, are connected in the model, having flexibility in the time stepping will enable more options for combining and resolving multiple scales of behavior.
PLASMA TURBULENCE AND KINETIC INSTABILITIES AT ION SCALES IN THE EXPANDING SOLAR WIND
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hellinger, Petr; Trávnícek, Pavel M.; Matteini, Lorenzo
The relationship between a decaying strong turbulence and kinetic instabilities in a slowly expanding plasma is investigated using two-dimensional (2D) hybrid expanding box simulations. We impose an initial ambient magnetic field perpendicular to the simulation box, and we start with a spectrum of large-scale, linearly polarized, random-phase Alfvénic fluctuations that have energy equipartition between kinetic and magnetic fluctuations and vanishing correlation between the two fields. A turbulent cascade rapidly develops; magnetic field fluctuations exhibit a power-law spectrum at large scales and a steeper spectrum at ion scales. The turbulent cascade leads to an overall anisotropic proton heating, protons are heatedmore » in the perpendicular direction, and, initially, also in the parallel direction. The imposed expansion leads to generation of a large parallel proton temperature anisotropy which is at later stages partly reduced by turbulence. The turbulent heating is not sufficient to overcome the expansion-driven perpendicular cooling and the system eventually drives the oblique firehose instability in a form of localized nonlinear wave packets which efficiently reduce the parallel temperature anisotropy. This work demonstrates that kinetic instabilities may coexist with strong plasma turbulence even in a constrained 2D regime.« less
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howison, Mark; Bethel, E. Wes; Childs, Hank
2012-01-01
With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems
NASA Astrophysics Data System (ADS)
Ierotheou, C. S.; Forsey, C. R.; Leatham, M.
2000-04-01
The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas
2015-01-01
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
MMS Observations of Parallel Electric Fields During a Quasi-Perpendicular Bow Shock Crossing
NASA Astrophysics Data System (ADS)
Goodrich, K.; Schwartz, S. J.; Ergun, R.; Wilder, F. D.; Holmes, J.; Burch, J. L.; Gershman, D. J.; Giles, B. L.; Khotyaintsev, Y. V.; Le Contel, O.; Lindqvist, P. A.; Strangeway, R. J.; Russell, C.; Torbert, R. B.
2016-12-01
Previous observations of the terrestrial bow shock have frequently shown large-amplitude fluctuations in the parallel electric field. These parallel electric fields are seen as both nonlinear solitary structures, such as double layers and electron phase-space holes, and short-wavelength waves, which can reach amplitudes greater than 100 mV/m. The Magnetospheric Multi-Scale (MMS) Mission has crossed the Earth's bow shock more than 200 times. The parallel electric field signatures observed in these crossings are seen in very discrete packets and evolve over time scales of less than a second, indicating the presence of a wealth of kinetic-scale activity. The high time resolution of the Fast Particle Instrument (FPI) available on MMS offers greater detail of the kinetic-scale physics that occur at bow shocks than ever before, allowing greater insight into the overall effect of these observed electric fields. We present a characterization of these parallel electric fields found in a single bow shock event and how it reflects the kinetic-scale activity that can occur at the terrestrial bow shock.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cree, Johnathan Vee; Delgado-Frias, Jose
Large scale wireless sensor networks have been proposed for applications ranging from anomaly detection in an environment to vehicle tracking. Many of these applications require the networks to be distributed across a large geographic area while supporting three to five year network lifetimes. In order to support these requirements large scale wireless sensor networks of duty-cycled devices need a method of efficient and effective autonomous configuration/maintenance. This method should gracefully handle the synchronization tasks duty-cycled networks. Further, an effective configuration solution needs to recognize that in-network data aggregation and analysis presents significant benefits to wireless sensor network and should configuremore » the network in a way such that said higher level functions benefit from the logically imposed structure. NOA, the proposed configuration and maintenance protocol, provides a multi-parent hierarchical logical structure for the network that reduces the synchronization workload. It also provides higher level functions with significant inherent benefits such as but not limited to: removing network divisions that are created by single-parent hierarchies, guarantees for when data will be compared in the hierarchy, and redundancies for communication as well as in-network data aggregation/analysis/storage.« less
Scalable NIC-based reduction on large-scale clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.; Fernández, J. C.; Petrini, F.
2003-01-01
Many parallel algorithms require effiaent support for reduction mllectives. Over the years, researchers have developed optimal reduction algonduns by taking inm account system size, dam size, and complexities of reduction operations. However, all of these algorithm have assumed the faa that the reduction precessing takes place on the host CPU. Modem Network Interface Cards (NICs) sport programmable processors with substantial memory and thus introduce a fresh variable into the equation This raises the following intersting challenge: Can we take advantage of modern NICs to implementJost redudion operations? In this paper, we take on this challenge in the context of large-scalemore » clusters. Through experiments on the 960-node, 1920-processor or ASCI Linux Cluster (ALC) located at the Lawrence Livermore National Laboratory, we show that NIC-based reductions indeed perform with reduced latency and immed consistency over host-based aleorithms for the wmmon case and that these benefits scale as the system grows. In the largest configuration tested--1812 processors-- our NIC-based algorithm can sum a single element vector in 73 ps with 32-bi integers and in 118 with Mbit floating-point numnbers. These results represent an improvement, respeaively, of 121% and 39% with resvect w the {approx}roductionle vel MPI library« less
NASA Astrophysics Data System (ADS)
Okita, Shin; Verestek, Wolfgang; Sakane, Shinji; Takaki, Tomohiro; Ohno, Munekazu; Shibuta, Yasushi
2017-09-01
Continuous processes of homogeneous nucleation, solidification and grain growth are spontaneously achieved from an undercooled iron melt without any phenomenological parameter in the molecular dynamics (MD) simulation with 12 million atoms. The nucleation rate at the critical temperature is directly estimated from the atomistic configuration by cluster analysis to be of the order of 1034 m-3 s-1. Moreover, time evolution of grain size distribution during grain growth is obtained by the combination of Voronoi and cluster analyses. The grain growth exponent is estimated to be around 0.3 from the geometric average of the grain size distribution. Comprehensive understanding of kinetic properties during continuous processes is achieved in the large-scale MD simulation by utilizing the high parallel efficiency of a graphics processing unit (GPU), which is shedding light on the fundamental aspects of production processes of materials from the atomistic viewpoint.
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Recent observations of the formation of filaments
NASA Technical Reports Server (NTRS)
Martin, Sara F.
1986-01-01
Two examples of the formation of small filaments in H alpha are described and illustrated. In both cases, the formation is seen to be the spontaneous appearance of strands of absorbing mass that evolve from no previous structure. The initial development of the filaments appears to consist of the accumulation of these absorptive strands along approximately parallel paths in a channel between large-scale, opposite polarity magnetic fields on either side of the filaments. The strands exhibit continuous changes in shape and degree of absorption which can be due to successive condensations resulting in new strands, mass motions within the strands, and outflow of the mass from the strands. For at least several hours before the formation of both filaments, small-scale fragments of opposite polarity, line-of-sight magnetic flux adjacent to or immediately below the filaments, and at the ends of the filaments, were cancelling. This type of magnetic flux disappearance continued during the development of the filaments and is commonly observed in association with established filaments. Cancellation is interpreted as an important evolutionary change in the magnetic field that can lead to configurations suitable for the formation of filaments.
Performance of parallel computation using CUDA for solving the one-dimensional elasticity equations
NASA Astrophysics Data System (ADS)
Darmawan, J. B. B.; Mungkasi, S.
2017-01-01
In this paper, we investigate the performance of parallel computation in solving the one-dimensional elasticity equations. Elasticity equations are usually implemented in engineering science. Solving these equations fast and efficiently is desired. Therefore, we propose the use of parallel computation. Our parallel computation uses CUDA of the NVIDIA. Our research results show that parallel computation using CUDA has a great advantage and is powerful when the computation is of large scale.
Cloud-based large-scale air traffic flow optimization
NASA Astrophysics Data System (ADS)
Cao, Yi
The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
Large Eddy Simulation of Vertical Axis Wind Turbines
NASA Astrophysics Data System (ADS)
Hezaveh, Seyed Hossein
Due to several design advantages and operational characteristics, particularly in offshore farms, vertical axis wind turbines (VAWTs) are being reconsidered as a complementary technology to horizontal axial turbines (HAWTs). However, considerable gaps remain in our understanding of VAWT performance since they have been significantly less studied than HAWTs. This thesis examines the performance of isolated VAWTs based on different design parameters and evaluates their characteristics in large wind farms. An actuator line model (ALM) is implemented in an atmospheric boundary layer large eddy simulation (LES) code, with offline coupling to a high-resolution blade-scale unsteady Reynolds-averaged Navier-Stokes (URANS) model. The LES captures the turbine-to-farm scale dynamics, while the URANS captures the blade-to-turbine scale flow. The simulation results are found to be in good agreement with existing experimental datasets. Subsequently, a parametric study of the flow over an isolated VAWT is carried out by varying solidities, height-to-diameter aspect ratios, and tip speed ratios. The analyses of the wake area and power deficits yield an improved understanding of the evolution of VAWT wakes, which in turn enables a more informed selection of turbine designs for wind farms. One of the most important advantages of VAWTs compared to HAWTs is their potential synergistic interactions that increase their performance when placed in close proximity. Field experiments have confirmed that unlike HAWTs, VAWTs can enhance and increase the total power production when placed near each other. Based on these experiments and using ALM-LES, we also present and test new approaches for VAWT farm configuration. We first design clusters with three turbines then configure farms consisting of clusters of VAWTs rather than individual turbines. The results confirm that by using a cluster design, the average power density of wind farms can be increased by as much as 60% relative to regular arrays. Finally, the thesis conducts an investigation of the influence of farm length (parallel to the wind) to assess the fetch needed for equilibrium to be reached, as well as the origin of the kinetic energy extracted by the turbines.
Myria: Scalable Analytics as a Service
NASA Astrophysics Data System (ADS)
Howe, B.; Halperin, D.; Whitaker, A.
2014-12-01
At the UW eScience Institute, we're working to empower non-experts, especially in the sciences, to write and use data-parallel algorithms. To this end, we are building Myria, a web-based platform for scalable analytics and data-parallel programming. Myria's internal model of computation is the relational algebra extended with iteration, such that every program is inherently data-parallel, just as every query in a database is inherently data-parallel. But unlike databases, iteration is a first class concept, allowing us to express machine learning tasks, graph traversal tasks, and more. Programs can be expressed in a number of languages and can be executed on a number of execution environments, but we emphasize a particular language called MyriaL that supports both imperative and declarative styles and a particular execution engine called MyriaX that uses an in-memory column-oriented representation and asynchronous iteration. We deliver Myria over the web as a service, providing an editor, performance analysis tools, and catalog browsing features in a single environment. We find that this web-based "delivery vector" is critical in reaching non-experts: they are insulated from irrelevant effort technical work associated with installation, configuration, and resource management. The MyriaX backend, one of several execution runtimes we support, is a main-memory, column-oriented, RDBMS-on-the-worker system that supports cyclic data flows as a first-class citizen and has been shown to outperform competitive systems on 100-machine cluster sizes. I will describe the Myria system, give a demo, and present some new results in large-scale oceanographic microbiology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, William Michael; Plimpton, Steven James; Wang, Peng
2010-03-01
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Flow visualization of mast-mounted-sight/main rotor aerodynamic interactions
NASA Technical Reports Server (NTRS)
Ghee, Terence A.; Kelley, Henry L.
1993-01-01
Flow visualization tests were conducted on a 27 percent-scale AH-64 attack helicopter model fitted with various mast-mounted-sight configurations in an attempt to identify the cause of adverse vibration encountered during full-scale flight tests of an Apache/Longbow configuration. The tests were conducted at the NASA Langley Research Center in the 14- by 22-Foot Subsonic Tunnel. A symmetric and an asymmetric mast-mounted-sight oriented at several skew angles were tested at forward and rearward flight speeds of 30 and 45 knots. A laser light sheet seeded with vaporized propylene glycol was used to visualize the wake of the sight in planes parallel and perpendicular to the freestream flow. Analysis of the flow visualization data identified the frequency of the wake shed from the sight, the angle-of-attack at the sight, and the location where the sight wake crossed the rotor plane. Differences in wake structure were observed between the various sight configurations and slew angles. Postulations into the cause of the adverse vibration found in flight test are given along with considerations for future tests.
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
NASA Astrophysics Data System (ADS)
Nakamura, Yuki; Ashi, Juichiro; Morita, Sumito
2016-04-01
To clarify timing and scale of past submarine landslides is important to understand formation processes of the landslides. The study area is in a part of continental slope of the Japan Trench, where a number of large-scale submarine landslide (slump) deposits have been identified in Pliocene and Quaternary formations by analysing METI's 3D seismic data "Sanrikuoki 3D" off Shimokita Peninsula (Morita et al., 2011). As structural features, swarm of parallel dikes which are likely dewatering paths formed accompanying the slumping deformation, and slip directions are basically perpendicular to the parallel dikes. Therefore, parallel dikes are good indicator for estimation of slip directions. Slip direction of each slide was determined one kilometre grid in the survey area of 40 km x 20 km. The remarkable slip direction varies from Pliocene to Quaternary in the survey area. Parallel dike structure is also available for the distinguishment of the slump deposit and normal deposit on time slice images. By tracing outline of slump deposits at each depth, we identified general morphology of the overall slump deposits, and calculated the volume of the extracted slump deposits so as to estimate the scale of each event. We investigated temporal and spatial variation of depositional pattern of the slump deposits. Calculating the generation interval of the slumps, some periodicity is likely recognized, especially large slump do not occur in succession. Additionally, examining the relationship of the cumulative volume and the generation interval, certain correlation is observed in Pliocene and Quaternary. Key words: submarine landslides, 3D seismic data, Shimokita Peninsula
Radiation-MHD Simulations of Pillars and Globules in HII Regions
NASA Astrophysics Data System (ADS)
Mackey, J.
2012-07-01
Implicit and explicit raytracing-photoionisation algorithms have been implemented in the author's radiation-magnetohydrodynamics code. The algorithms are described briefly and their efficiency and parallel scaling are investigated. The implicit algorithm is more efficient for calculations where ionisation fronts have very supersonic velocities, and the explicit algorithm is favoured in the opposite limit because of its better parallel scaling. The implicit method is used to investigate the effects of initially uniform magnetic fields on the formation and evolution of dense pillars and cometary globules at the boundaries of HII regions. It is shown that for weak and medium field strengths an initially perpendicular field is swept into alignment with the pillar during its dynamical evolution, matching magnetic field observations of the ‘Pillars of Creation’ in M16. A strong perpendicular magnetic field remains in its initial configuration and also confines the photoevaporation flow into a bar-shaped, dense, ionised ribbon which partially shields the ionisation front.
Kinetic Alfvén Wave Generation by Large-scale Phase Mixing
NASA Astrophysics Data System (ADS)
Vásconez, C. L.; Pucci, F.; Valentini, F.; Servidio, S.; Matthaeus, W. H.; Malara, F.
2015-12-01
One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length dp may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to dp and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the role of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained.
Enhancing sedimentation by improving flow conditions using parallel retrofit baffles.
He, Cheng; Scott, Eric; Rochfort, Quintin
2015-09-01
In this study, placing parallel-connected baffles in the vicinity of the inlet was proposed to improve hydraulic conditions for enhancing TSS (total suspended solids) removal. The purpose of the retrofit baffle design is to divide the large and fast inflow into smaller and slower flows to increase flow uniformity. This avoids short-circuiting and increases residence time in the sedimentation basin. The newly proposed parallel-connected baffle configuration was assessed in the laboratory by comparing its TSS removal performance and the optimal flow residence time with those from the widely used series-connected baffles. The experimental results showed that the parallel-connected baffles outperformed the series-connected baffles because it could disperse flow faster and in less space by splitting the large inflow into many small branches instead of solely depending on flow internal friction over a longer flow path, as was the case under the series-connected baffles. Being able to dampen faster flow before entering the sedimentation basin is critical to reducing the possibility of disturbing any settled particles, especially under high inflow conditions. Also, for a large sedimentation basin, it may be more economically feasible to deploy the proposed parallel retrofit baffle in the vicinity of the inlet than series-connected baffles throughout the entire settling basin. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
web cellHTS2: a web-application for the analysis of high-throughput screening data.
Pelz, Oliver; Gilsdorf, Moritz; Boutros, Michael
2010-04-12
The analysis of high-throughput screening data sets is an expanding field in bioinformatics. High-throughput screens by RNAi generate large primary data sets which need to be analyzed and annotated to identify relevant phenotypic hits. Large-scale RNAi screens are frequently used to identify novel factors that influence a broad range of cellular processes, including signaling pathway activity, cell proliferation, and host cell infection. Here, we present a web-based application utility for the end-to-end analysis of large cell-based screening experiments by cellHTS2. The software guides the user through the configuration steps that are required for the analysis of single or multi-channel experiments. The web-application provides options for various standardization and normalization methods, annotation of data sets and a comprehensive HTML report of the screening data analysis, including a ranked hit list. Sessions can be saved and restored for later re-analysis. The web frontend for the cellHTS2 R/Bioconductor package interacts with it through an R-server implementation that enables highly parallel analysis of screening data sets. web cellHTS2 further provides a file import and configuration module for common file formats. The implemented web-application facilitates the analysis of high-throughput data sets and provides a user-friendly interface. web cellHTS2 is accessible online at http://web-cellHTS2.dkfz.de. A standalone version as a virtual appliance and source code for platforms supporting Java 1.5.0 can be downloaded from the web cellHTS2 page. web cellHTS2 is freely distributed under GPL.
2016-08-10
AFRL-AFOSR-JP-TR-2016-0073 Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation ...2016 4. TITLE AND SUBTITLE Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation 5a...performances on various machine learning tasks and it naturally lends itself to fast parallel implementations . Despite this, very little work has been
A case for spiking neural network simulation based on configurable multiple-FPGA systems.
Yang, Shufan; Wu, Qiang; Li, Renfa
2011-09-01
Recent neuropsychological research has begun to reveal that neurons encode information in the timing of spikes. Spiking neural network simulations are a flexible and powerful method for investigating the behaviour of neuronal systems. Simulation of the spiking neural networks in software is unable to rapidly generate output spikes in large-scale of neural network. An alternative approach, hardware implementation of such system, provides the possibility to generate independent spikes precisely and simultaneously output spike waves in real time, under the premise that spiking neural network can take full advantage of hardware inherent parallelism. We introduce a configurable FPGA-oriented hardware platform for spiking neural network simulation in this work. We aim to use this platform to combine the speed of dedicated hardware with the programmability of software so that it might allow neuroscientists to put together sophisticated computation experiments of their own model. A feed-forward hierarchy network is developed as a case study to describe the operation of biological neural systems (such as orientation selectivity of visual cortex) and computational models of such systems. This model demonstrates how a feed-forward neural network constructs the circuitry required for orientation selectivity and provides platform for reaching a deeper understanding of the primate visual system. In the future, larger scale models based on this framework can be used to replicate the actual architecture in visual cortex, leading to more detailed predictions and insights into visual perception phenomenon.
Denneulin, T; Wollschläger, N; Everhardt, A S; Farokhipoor, S; Noheda, B; Snoeck, E; Hÿtch, M
2018-05-31
Lead zirconate titanate samples are used for their piezoelectric and ferroelectric properties in various types of micro-devices. Epitaxial layers of tetragonal perovskites have a tendency to relax by forming [Formula: see text] ferroelastic domains. The accommodation of the a/c/a/c polydomain structure on a flat substrate leads to nanoscale deformation gradients which locally influence the polarization by flexoelectric effect. Here, we investigated the deformation fields in epitaxial layers of Pb(Zr 0.2 Ti 0.8 )O 3 grown on SrTiO 3 substrates using transmission electron microscopy (TEM). We found that the deformation gradients depend on the domain walls inclination ([Formula: see text] or [Formula: see text] to the substrate interface) of the successive [Formula: see text] domains and we describe three different a/c/a domain configurations: one configuration with parallel a-domains and two configurations with perpendicular a-domains (V-shaped and hat-[Formula: see text]-shaped). In the parallel configuration, the c-domains contain horizontal and vertical gradients of out-of-plane deformation. In the V-shaped and hat-[Formula: see text]-shaped configurations, the c-domains exhibit a bending deformation field with vertical gradients of in-plane deformation. Each of these configurations is expected to have a different influence on the polarization and so the local properties of the film. The deformation gradients were measured using dark-field electron holography, a TEM technique, which offers a good sensitivity (0.1%) and a large field-of-view (hundreds of nanometers). The measurements are compared with finite element simulations.
Experiences with hypercube operating system instrumentation
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Rudolph, David C.
1989-01-01
The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Liter-scale production of uniform gas bubbles via parallelization of flow-focusing generators.
Jeong, Heon-Ho; Yadavali, Sagar; Issadore, David; Lee, Daeyeon
2017-07-25
Microscale gas bubbles have demonstrated enormous utility as versatile templates for the synthesis of functional materials in medicine, ultra-lightweight materials and acoustic metamaterials. In many of these applications, high uniformity of the size of the gas bubbles is critical to achieve the desired properties and functionality. While microfluidics have been used with success to create gas bubbles that have a uniformity not achievable using conventional methods, the inherently low volumetric flow rate of microfluidics has limited its use in most applications. Parallelization of liquid droplet generators, in which many droplet generators are incorporated onto a single chip, has shown great promise for the large scale production of monodisperse liquid emulsion droplets. However, the scale-up of monodisperse gas bubbles using such an approach has remained a challenge because of possible coupling between parallel bubbles generators and feedback effects from the downstream channels. In this report, we systematically investigate the effect of factors such as viscosity of the continuous phase, capillary number, and gas pressure as well as the channel uniformity on the size distribution of gas bubbles in a parallelized microfluidic device. We show that, by optimizing the flow conditions, a device with 400 parallel flow focusing generators on a footprint of 5 × 5 cm 2 can be used to generate gas bubbles with a coefficient of variation of less than 5% at a production rate of approximately 1 L h -1 . Our results suggest that the optimization of flow conditions using a device with a small number (e.g., 8) of parallel FFGs can facilitate large-scale bubble production.
Ceramic-metal composites prepared via tape casting and melt infiltration methods
NASA Astrophysics Data System (ADS)
Kim, Hyun Jun
Melt infiltration of preforms prepared by tape casting and lamination has been accomplished using a short-time infiltration process that significantly suppresses reaction product formation. For layered materials produced via infiltration of laminated ceramic tapes, of particular interest is the effect that a large change in microstructure has on infiltration, phase formation, and mechanical properties. Hardness of the fine scale composite layers is approximately three times higher than coarse scale layers, due to greater strength of the fine B4C network. Fractography showed that crack propagation occurred by brittle fracture of the carbide and ductile extension of the metal. Despite large differences in hardness, the fracture mode of the fine and coarse scale microstructures appears identical. Fluid flow modeling for tape casting was conducted with a Newtonian slurry under a parallel blade, and the effect of beveling the blade based on a one dimensional flow model is shown. The discussion on slurry deformation after the blade exit suggests that the mode of slurry deformation depends on the relative importance of the pressure gradient and wall shear and that the existence of zero shear plane might have a negative effect on particle alignment in the tape. The analysis of the flow under a beveled blade predicts that this configuration is more advantageous than the parallel blade for productivity and parallel blade is better for producing uniform particle alignment and thinner tape. Also, the one dimensional flow model for the beveled blade is shown to be a valid approximation of the fluid behavior below a blade angle of 45 degrees. The flow visualization study on tape casting was conducted with a transparent apparatus and model slurry. Most investigators have concluded that the shear stress between the doctor blade and moving carrier causes the particle alignment, but, according to the result of visualization experiment, some degree of particle alignment is already established in the reservoir. The fluid flow concept of tape casting is incorporated with a metal infiltration technique to prepare the ceramic-metal composites with tailored porosity and pore orientation. Boron carbide-aluminum system was used to prepare the composites, and its stiffness constants were investigated. The aligned metal ligaments rarely affect the stiffness constant anisotropy which appears to be caused by tape casting operation.
Shim, Youngseon; Kim, Hyung J; Jung, Younjoon
2012-01-01
Supercapacitors with two single-sheet graphene electrodes in the parallel plate geometry are studied via molecular dynamics (MD) computer simulations. Pure 1-ethyl-3-methylimidazolium tetrafluoroborate (EMI+BF4-) and a 1.1 M solution of EMI+BF4- in acetonitrile are considered as prototypes of room-temperature ionic liquids (RTILs) and organic electrolytes. Electrolyte structure, charge density and associated electric potential are investigated by varying the charges and separation of the two electrodes. Multiple charge layers formed in the electrolytes in the vicinity of the electrodes are found to screen the electrode surface charge almost completely. As a result, the supercapacitors show nearly an ideal electric double layer behavior, i.e., the electric potential exhibits essentially a plateau behavior in the entire electrolyte region except for sharp changes in screening zones very close to the electrodes. Due to its small size and large charge separation, BF4- is considerably more efficient in shielding electrode charges than EMI+. In the case of the acetonitrile solution, acetonitrile also plays an important role by aligning its dipoles near the electrodes; however, the overall screening mainly arises from ions. Because of the disparity of shielding efficiency between cations and anions, the capacitance of the positively-charged anode is significantly larger than that of the negatively-charged cathode. Therefore, the total cell capacitance in the parallel plate configuration is primarily governed by the cathode. Ion conductivity obtained via the Green-Kubo (GK) method is found to be largely independent of the electrode surface charge. Interestingly, EMI+BF4- shows higher GK ion conductivity than the 1.1 M acetonitrile solution between two parallel plate electrodes.
System and method for continuous solids slurry depressurization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leininger, Thomas Frederick; Steele, Raymond Douglas; Yen, Hsien-Chin William
A continuous slag processing system includes a rotating parallel disc pump, coupled to a motor and a brake. The rotating parallel disc pump includes opposing discs coupled to a shaft, an outlet configured to continuously receive a fluid at a first pressure, and an inlet configured to continuously discharge the fluid at a second pressure less than the first pressure. The rotating parallel disc pump is configurable in a reverse-acting pump mode and a letdown turbine mode. The motor is configured to drive the opposing discs about the shaft and against a flow of the fluid to control a differencemore » between the first pressure and the second pressure in the reverse-acting pump mode. The brake is configured to resist rotation of the opposing discs about the shaft to control the difference between the first pressure and the second pressure in the letdown turbine mode.« less
Users manual for program NYQUIST: Liquid rocket nyquist plots developed for use on a PC computer
NASA Astrophysics Data System (ADS)
Armstrong, Wilbur C.
1992-06-01
The piping in a liquid rocket can assume complex configurations due to multiple tanks, multiple engines, and structures that must be piped around. The capability to handle some of these complex configurations have been incorporated into the NYQUIST code. The capability to modify the input on line has been implemented. The configurations allowed include multiple tanks, multiple engines, and the splitting of a pipe into unequal segments going to different (or the same) engines. This program will handle the following type elements: straight pipes, bends, inline accumulators, tuned stub accumulators, Helmholtz resonators, parallel resonators, pumps, split pipes, multiple tanks, and multiple engines. The code is too large to compile as one program using Microsoft FORTRAN 5; therefore, the code was broken into two segments: NYQUIST1.FOR and NYQUIST2.FOR. These are compiled separately and then linked together. The final run code is not too large (approximately equals 344,000 bytes).
Users manual for program NYQUIST: Liquid rocket nyquist plots developed for use on a PC computer
NASA Technical Reports Server (NTRS)
Armstrong, Wilbur C.
1992-01-01
The piping in a liquid rocket can assume complex configurations due to multiple tanks, multiple engines, and structures that must be piped around. The capability to handle some of these complex configurations have been incorporated into the NYQUIST code. The capability to modify the input on line has been implemented. The configurations allowed include multiple tanks, multiple engines, and the splitting of a pipe into unequal segments going to different (or the same) engines. This program will handle the following type elements: straight pipes, bends, inline accumulators, tuned stub accumulators, Helmholtz resonators, parallel resonators, pumps, split pipes, multiple tanks, and multiple engines. The code is too large to compile as one program using Microsoft FORTRAN 5; therefore, the code was broken into two segments: NYQUIST1.FOR and NYQUIST2.FOR. These are compiled separately and then linked together. The final run code is not too large (approximately equals 344,000 bytes).
Hyper-Parallel Tempering Monte Carlo Method and It's Applications
NASA Astrophysics Data System (ADS)
Yan, Qiliang; de Pablo, Juan
2000-03-01
A new generalized hyper-parallel tempering Monte Carlo molecular simulation method is presented for study of complex fluids. The method is particularly useful for simulation of many-molecule complex systems, where rough energy landscapes and inherently long characteristic relaxation times can pose formidable obstacles to effective sampling of relevant regions of configuration space. The method combines several key elements from expanded ensemble formalisms, parallel-tempering, open ensemble simulations, configurational bias techniques, and histogram reweighting analysis of results. It is found to accelerate significantly the diffusion of a complex system through phase-space. In this presentation, we demonstrate the effectiveness of the new method by implementing it in grand canonical ensembles for a Lennard-Jones fluid, for the restricted primitive model of electrolyte solutions (RPM), and for polymer solutions and blends. Our results indicate that the new algorithm is capable of overcoming the large free energy barriers associated with phase transitions, thereby greatly facilitating the simulation of coexistence properties. It is also shown that the method can be orders of magnitude more efficient than previously available techniques. More importantly, the method is relatively simple and can be incorporated into existing simulation codes with minor efforts.
Numerical Investigation of Dual-Mode Scramjet Combustor with Large Upstream Interaction
NASA Technical Reports Server (NTRS)
Mohieldin, T. O.; Tiwari, S. N.; Reubush, David E. (Technical Monitor)
2004-01-01
Dual-mode scramjet combustor configuration with significant upstream interaction is investigated numerically, The possibility of scaling the domain to accelerate the convergence and reduce the computational time is explored. The supersonic combustor configuration was selected to provide an understanding of key features of upstream interaction and to identify physical and numerical issues relating to modeling of dual-mode configurations. The numerical analysis was performed with vitiated air at freestream Math number of 2.5 using hydrogen as the sonic injectant. Results are presented for two-dimensional models and a three-dimensional jet-to-jet symmetric geometry. Comparisons are made with experimental results. Two-dimensional and three-dimensional results show substantial oblique shock train reaching upstream of the fuel injectors. Flow characteristics slow numerical convergence, while the upstream interaction slowly increases with further iterations. As the flow field develops, the symmetric assumption breaks down. A large separation zone develops and extends further upstream of the step. This asymmetric flow structure is not seen in the experimental data. Results obtained using a sub-scale domain (both two-dimensional and three-dimensional) qualitatively recover the flow physics obtained from full-scale simulations. All results show that numerical modeling using a scaled geometry provides good agreement with full-scale numerical results and experimental results for this configuration. This study supports the argument that numerical scaling is useful in simulating dual-mode scramjet combustor flowfields and could provide an excellent convergence acceleration technique for dual-mode simulations.
Xu, Jingxiang; Higuchi, Yuji; Ozawa, Nobuki; Sato, Kazuhisa; Hashida, Toshiyuki; Kubo, Momoji
2017-09-20
Ni sintering in the Ni/YSZ porous anode of a solid oxide fuel cell changes the porous structure, leading to degradation. Preventing sintering and degradation during operation is a great challenge. Usually, a sintering molecular dynamics (MD) simulation model consisting of two particles on a substrate is used; however, the model cannot reflect the porous structure effect on sintering. In our previous study, a multi-nanoparticle sintering modeling method with tens of thousands of atoms revealed the effect of the particle framework and porosity on sintering. However, the method cannot reveal the effect of the particle size on sintering and the effect of sintering on the change in the porous structure. In the present study, we report a strategy to reveal them in the porous structure by using our multi-nanoparticle modeling method and a parallel large-scale multimillion-atom MD simulator. We used this method to investigate the effect of YSZ particle size and tortuosity on sintering and degradation in the Ni/YSZ anodes. Our parallel large-scale MD simulation showed that the sintering degree decreased as the YSZ particle size decreased. The gas fuel diffusion path, which reflects the overpotential, was blocked by pore coalescence during sintering. The degradation of gas diffusion performance increased as the YSZ particle size increased. Furthermore, the gas diffusion performance was quantified by a tortuosity parameter and an optimal YSZ particle size, which is equal to that of Ni, was found for good diffusion after sintering. These findings cannot be obtained by previous MD sintering studies with tens of thousands of atoms. The present parallel large-scale multimillion-atom MD simulation makes it possible to clarify the effects of the particle size and tortuosity on sintering and degradation.
Podolak, Charles J.
2013-01-01
An ensemble of rule-based models was constructed to assess possible future braided river planform configurations for the Toklat River in Denali National Park and Preserve, Alaska. This approach combined an analysis of large-scale influences on stability with several reduced-complexity models to produce the predictions at a practical level for managers concerned about the persistence of bank erosion while acknowledging the great uncertainty in any landscape prediction. First, a model of confluence angles reproduced observed angles of a major confluence, but showed limited susceptibility to a major rearrangement of the channel planform downstream. Second, a probabilistic map of channel locations was created with a two-parameter channel avulsion model. The predicted channel belt location was concentrated in the same area as the current channel belt. Finally, a suite of valley-scale channel and braid plain characteristics were extracted from a light detection and ranging (LiDAR)-derived surface. The characteristics demonstrated large-scale stabilizing topographic influences on channel planform. The combination of independent analyses increased confidence in the conclusion that the Toklat River braided planform is a dynamically stable system due to large and persistent valley-scale influences, and that a range of avulsive perturbations are likely to result in a relatively unchanged planform configuration in the short term.
NASA Technical Reports Server (NTRS)
Le, Guan; Wang, Yongli; Slavin, James A.; Strangeway, Robert J.
2007-01-01
Space Technology 5 (ST5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that meso-scale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of - 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are approx. 1 min for meso-scale currents and approx. 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omelchenko, Yuri A.
Global interactions of energetic ions with magnetoplasmas and neutral gases lie at the core of many space and laboratory plasma phenomena ranging from solar wind entry into and transport within planetary magnetospheres and exospheres to fast-ion driven instabilities in fusion devices to astrophysics-in-lab experiments. The ability of computational models to properly account for physical effects that underlie such interactions, namely ion kinetic, ion cyclotron, Hall, collisional and ionization processes is important for the success and planning of experimental research in plasma physics. Understanding the physics of energetic ions, in particular their nonlinear resonance interactions with Alfvén waves, is central tomore » improving the heating performance of magnetically confined plasmas for future energy generation. Fluid models are not adequate for high-beta plasmas as they cannot fully capture ion kinetic and cyclotron physics (e.g., ion behavior in the presence of magnetic nulls, shock structures, plasma interpenetration, etc.). Recent results from global reconnection simulations show that even in a MHD-like regime there may be significant differences between kinetic and MHD simulations. Therefore, kinetic modeling becomes essential for meeting modern day challenges in plasma physics. The hybrid approximation is an intermediate approximation between the fluid and fully kinetic approximations. It eliminates light waves, removes the electron inertial temporal and spatial scales from the problem and enables full-orbit ion kinetics. As a result, hybrid codes have become effective tools for exploring ion-scale driven phenomena associated with ion beams, shocks, reconnection and turbulence that control the large-scale behavior of laboratory and space magnetoplasmas. A number of numerical issues, however, make three-dimensional (3D) large-scale hybrid simulations of inhomogeneous magnetized plasmas prohibitively expensive or even impossible. To resolve these difficulties we have developed a novel Event-driven Multiscale Asynchronous Parallel Simulation (EMAPS) technology that replaces time stepping with self-adaptive update events. Local calculations are carried out only on an “as needed basis”. EMAPS (i) guarantees accurate and stable processing of physical variables in time accurate simulations, and (ii) eliminates unnecessary computation. Applying EMAPS to the hybrid model has resulted in the development of a unique parallel code, dimension-independent (compile-time-configurable) HYPERS (Hybrid Parallel Event-Resolved Simulator) that scales to hundreds of thousands of parallel processors. HYPERS advances electromagnetic fields and particles asynchronously on time scales determined by local physical laws and mesh properties. To achieve high computational accuracy in complex device geometries, HYPERS employs high-fidelity Cartesian grids with masked conductive cells. The HYPERS model includes multiple ion species, energy and momentum conserving ion-ion collisions, and provides a number of approximations for plasma resistivity and vacuum regions. Both local and periodic boundary conditions are allowed. The HYPERS solver preserves zero divergence of magnetic field. The project has demonstrated HYPERS capabilities on a number of applications of interest to fusion and astrophysical plasma physics applications listed below. 1. Theta-pinch formation of FRCs The formation, spontaneous spin-up, and stability of theta-pinch formed field-reversed configurations have been studied self-consistently in 3D. The end-to-end hybrid simulations reveal poloidal profiles of implosion-driven fast toroidal plasma rotation and demonstrate three discharge regimes as a function of experimental parameters: the decaying stable configuration, the tilt unstable configuration, and the nonlinear evolution of a fast growing tearing mode. 2. FRC collisions with magnetic mirrors Interactions of fast plasma streams and objects with magnetic obstacles (dipoles, mirrors, etc) lie at the core of many space and laboratory plasma phenomena ranging from magnetoshells and solar wind interactions with planetary magnetospheres to compact fusion plasmas. HYPERS simulations are compared with data from the MSX experiment (LANL) that focuses on the physics of magnetized collisionless shocks through the acceleration and subsequent stagnation of FRC plasmoids against a strong magnetic mirrors and flux-conserving boundaries. 3. Exploding magnetoplasmas Results from hybrid simulations of two experiments at the LAPD and Nevada Terawatt Facility are discussed where short-pulse lasers are used to ablate solid targets to produce plasmas that expand across external magnetic fields. The first simulation recreates flutelike density striations observed at the leading edge of a carbon plasma and predicts an early destruction of the magnetic cavity in agreement with experimental evidence. In the second simulation a polyethylene target is ablated into a mixture of protons and carbon ions. A mechanism is demonstrated that allows protons to penetrate the magnetic field in the form of a collimated flow. The results are compared to experimental data and single-fluid MHD simulations. The EMAPS framework has the potential for wide application in many other engineering and scientific fields, such as climate models, biological systems, electronic devices, seismic events, oil reservation simulators that all involve advancing solutions of partial differential equations in time where the rate of activity can be adapted widely over the spatial domain depending on locally space/time phenomena (“events”).« less
NASA Astrophysics Data System (ADS)
Brodie, K. L.; McNinch, J. E.
2008-12-01
Accurate predictions of shoreline response to storms are contingent upon coastal-morphodynamic models effectively synthesizing the complex evolving relationships between beach topography, sandbar morphology, nearshore bathymetry, underlying geology, and the nearshore wave-field during storm events. Analysis of "pre" and "post" storm data sets have led to a common theory for event response of the nearshore system: pre-storm three-dimensional bar and shoreline configurations shift to two-dimensional, linear forms post- storm. A lack of data during storms has unfortunately left a gap in our knowledge of how the system explicitly changes during the storm event. This work presents daily observations of the beach and nearshore during high-energy storm events over a spatially extensive field site (order of magnitude: 10 km) using Bar and Swash Imaging Radar (BASIR), a mobile x-band radar system. The field site contains a complexity of features including shore-oblique bars and troughs, heterogeneous sediment, and an erosional hotspot. BASIR data provide observations of the evolution of shoreline and bar morphology, as well as nearshore bathymetry, throughout the storm events. Nearshore bathymetry is calculated using a bathymetry inversion from radar- derived wave celerity measurements. Preliminary results show a relatively stable but non-linear shore-parallel bar and a non-linear shoreline with megacusp and embayment features (order of magnitude: 1 km) that are enhanced during the wave events. Both the shoreline and shore-parallel bar undulate at a similar spatial frequency to the nearshore shore- oblique bar-field. Large-scale shore-oblique bars and troughs remain relatively static in position and morphology throughout the storm events. The persistence of a three-dimensional shoreline, shore-parallel bar, and large-scale shore-oblique bars and troughs, contradicts the idea of event-driven shifts to two- dimensional morphology and suggests that beach and nearshore response to storms may be location specific. We hypothesize that the influence of underlying geology, defined by (1) the introduction of heterogeneous sediment and (2) the possible creation of shore-oblique bars and troughs in the nearshore, may be responsible for the persistence of three-dimensional forms and the associated shoreline hotspots during storm events.
Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla
2016-11-01
Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
Transport of cosmic-ray protons in intermittent heliospheric turbulence: Model and simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alouani-Bibi, Fathallah; Le Roux, Jakobus A., E-mail: fb0006@uah.edu
The transport of charged energetic particles in the presence of strong intermittent heliospheric turbulence is computationally analyzed based on known properties of the interplanetary magnetic field and solar wind plasma at 1 astronomical unit. The turbulence is assumed to be static, composite, and quasi-three-dimensional with a varying energy distribution between a one-dimensional Alfvénic (slab) and a structured two-dimensional component. The spatial fluctuations of the turbulent magnetic field are modeled either as homogeneous with a Gaussian probability distribution function (PDF), or as intermittent on large and small scales with a q-Gaussian PDF. Simulations showed that energetic particle diffusion coefficients both parallelmore » and perpendicular to the background magnetic field are significantly affected by intermittency in the turbulence. This effect is especially strong for parallel transport where for large-scale intermittency results show an extended phase of subdiffusive parallel transport during which cross-field transport diffusion dominates. The effects of intermittency are found to depend on particle rigidity and the fraction of slab energy in the turbulence, yielding a perpendicular to parallel mean free path ratio close to 1 for large-scale intermittency. Investigation of higher order transport moments (kurtosis) indicates that non-Gaussian statistical properties of the intermittent turbulent magnetic field are present in the parallel transport, especially for low rigidity particles at all times.« less
Multi-Modulator for Bandwidth-Efficient Communication
NASA Technical Reports Server (NTRS)
Gray, Andrew; Lee, Dennis; Lay, Norman; Cheetham, Craig; Fong, Wai; Yeh, Pen-Shu; King, Robin; Ghuman, Parminder; Hoy, Scott; Fisher, Dave
2009-01-01
A modulator circuit board has recently been developed to be used in conjunction with a vector modulator to generate any of a large number of modulations for bandwidth-efficient radio transmission of digital data signals at rates than can exceed 100 Mb/s. The modulations include quadrature phaseshift keying (QPSK), offset quadrature phase-shift keying (OQPSK), Gaussian minimum-shift keying (GMSK), and octonary phase-shift keying (8PSK) with square-root raised-cosine pulse shaping. The figure is a greatly simplified block diagram showing the relationship between the modulator board and the rest of the transmitter. The role of the modulator board is to encode the incoming data stream and to shape the resulting pulses, which are fed as inputs to the vector modulator. The combination of encoding and pulse shaping in a given application is chosen to maximize the bandwidth efficiency. The modulator board includes gallium arsenide serial-to-parallel converters at its input end. A complementary metal oxide/semiconductor (CMOS) field-programmable gate array (FPGA) performs the coding and modulation computations and utilizes parallel processing in doing so. The results of the parallel computation are combined and converted to pulse waveforms by use of gallium arsenide parallel-to-serial converters integrated with digital-to-analog converters. Without changing the hardware, one can configure the modulator to produce any of the designed combinations of coding and modulation by loading the appropriate bit configuration file into the FPGA.
Space Technology 5 Multi-Point Observations of Temporal Variability of Field-Aligned Currents
NASA Technical Reports Server (NTRS)
Le, Guan; Wang, Yongli; Slavin, James A.; Strangeway, Robert J.
2008-01-01
Space Technology 5 (ST5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that meso-scale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of approximately 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are approximately 1 min for meso-scale currents and approximately 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Performance Assessment of a Large Scale Pulsejet- Driven Ejector System
NASA Technical Reports Server (NTRS)
Paxson, Daniel E.; Litke, Paul J.; Schauer, Frederick R.; Bradley, Royce P.; Hoke, John L.
2006-01-01
Unsteady thrust augmentation was measured on a large scale driver/ejector system. A 72 in. long, 6.5 in. diameter, 100 lb(sub f) pulsejet was tested with a series of straight, cylindrical ejectors of varying length, and diameter. A tapered ejector configuration of varying length was also tested. The objectives of the testing were to determine the dimensions of the ejectors which maximize thrust augmentation, and to compare the dimensions and augmentation levels so obtained with those of other, similarly maximized, but smaller scale systems on which much of the recent unsteady ejector thrust augmentation studies have been performed. An augmentation level of 1.71 was achieved with the cylindrical ejector configuration and 1.81 with the tapered ejector configuration. These levels are consistent with, but slightly lower than the highest levels achieved with the smaller systems. The ejector diameter yielding maximum augmentation was 2.46 times the diameter of the pulsejet. This ratio closely matches those of the small scale experiments. For the straight ejector, the length yielding maximum augmentation was 10 times the diameter of the pulsejet. This was also nearly the same as the small scale experiments. Testing procedures are described, as are the parametric variations in ejector geometry. Results are discussed in terms of their implications for general scaling of pulsed thrust ejector systems
Regional-scale calculation of the LS factor using parallel processing
NASA Astrophysics Data System (ADS)
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
A Parallel Pipelined Renderer for the Time-Varying Volume Data
NASA Technical Reports Server (NTRS)
Chiueh, Tzi-Cker; Ma, Kwan-Liu
1997-01-01
This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Operation of an aquatic worm reactor suitable for sludge reduction at large scale.
Hendrickx, Tim L G; Elissen, Hellen H J; Temmink, Hardy; Buisman, Cees J N
2011-10-15
Treatment of domestic waste water results in the production of waste sludge, which requires costly further processing. A biological method to reduce the amount of waste sludge and its volume is treatment in an aquatic worm reactor. The potential of such a worm reactor with the oligochaete Lumbriculus variegatus has been shown at small scale. For scaling up purposes, a new configuration of the reactor was designed, in which the worms were positioned horizontally in the carrier material. This was tested in a continuous experiment of 8 weeks where it treated all the waste sludge from a lab-scale activated sludge process. The results showed a higher worm growth rate compared to previous experiments with the old configuration, whilst nutrient release was similar. The new configuration has a low footprint and allows for easy aeration and faeces collection, thereby making it suitable for full scale application. Copyright © 2011 Elsevier Ltd. All rights reserved.
Multiple Independent File Parallel I/O with HDF5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, M. C.
2016-07-13
The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as O(10 6) parallel tasks.
Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patchett, John M; Ahrens, James P; Lo, Li - Ta
2010-10-15
Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less
Locally adaptive parallel temperature accelerated dynamics method
NASA Astrophysics Data System (ADS)
Shim, Yunsic; Amar, Jacques G.
2010-03-01
The recently-developed temperature-accelerated dynamics (TAD) method [M. Sørensen and A.F. Voter, J. Chem. Phys. 112, 9599 (2000)] along with the more recently developed parallel TAD (parTAD) method [Y. Shim et al, Phys. Rev. B 76, 205439 (2007)] allow one to carry out non-equilibrium simulations over extended time and length scales. The basic idea behind TAD is to speed up transitions by carrying out a high-temperature MD simulation and then use the resulting information to obtain event times at the desired low temperature. In a typical implementation, a fixed high temperature Thigh is used. However, in general one expects that for each configuration there exists an optimal value of Thigh which depends on the particular transition pathways and activation energies for that configuration. Here we present a locally adaptive high-temperature TAD method in which instead of using a fixed Thigh the high temperature is dynamically adjusted in order to maximize simulation efficiency. Preliminary results of the performance obtained from parTAD simulations of Cu/Cu(100) growth using the locally adaptive Thigh method will also be presented.
Examining Chaotic Convection with Super-Parameterization Ensembles
NASA Astrophysics Data System (ADS)
Jones, Todd R.
This study investigates a variety of features present in a new configuration of the Community Atmosphere Model (CAM) variant, SP-CAM 2.0. The new configuration (multiple-parameterization-CAM, MP-CAM) changes the manner in which the super-parameterization (SP) concept represents physical tendency feedbacks to the large-scale by using the mean of 10 independent two-dimensional cloud-permitting model (CPM) curtains in each global model column instead of the conventional single CPM curtain. The climates of the SP and MP configurations are examined to investigate any significant differences caused by the application of convective physical tendencies that are more deterministic in nature, paying particular attention to extreme precipitation events and large-scale weather systems, such as the Madden-Julian Oscillation (MJO). A number of small but significant changes in the mean state climate are uncovered, and it is found that the new formulation degrades MJO performance. Despite these deficiencies, the ensemble of possible realizations of convective states in the MP configuration allows for analysis of uncertainty in the small-scale solution, lending to examination of those weather regimes and physical mechanisms associated with strong, chaotic convection. Methods of quantifying precipitation predictability are explored, and use of the most reliable of these leads to the conclusion that poor precipitation predictability is most directly related to the proximity of the global climate model column state to atmospheric critical points. Secondarily, the predictability is tied to the availability of potential convective energy, the presence of mesoscale convective organization on the CPM grid, and the directive power of the large-scale.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites
NASA Technical Reports Server (NTRS)
Sues, R. H.; Lua, Y. J.; Smith, M. D.
1994-01-01
The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction
NASA Technical Reports Server (NTRS)
Padovan, Joseph; Krishna, Lala; Gute, Douglas
1997-01-01
Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.
NASA Astrophysics Data System (ADS)
Zerr, Robert Joseph
2011-12-01
The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Space Technology 5 (ST-5) Observations of Field-Aligned Currents: Temporal Variability
NASA Technical Reports Server (NTRS)
Le, Guan
2010-01-01
Space Technology 5 (ST-5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from STS. The data demonstrate that masoscale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of about 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are about I min for meso-scale currents and about 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
NASA Technical Reports Server (NTRS)
Le, Guan
2010-01-01
Space Technology 5 (ST-5) is a three micro-satellite constellation deployed into a 300 x 4500 km, dawn-dusk, sun-synchronous polar orbit from March 22 to June 21, 2006, for technology validations. In this paper, we present a study of the temporal variability of field-aligned currents using multi-point magnetic field measurements from ST5. The data demonstrate that mesoscale current structures are commonly embedded within large-scale field-aligned current sheets. The meso-scale current structures are very dynamic with highly variable current density and/or polarity in time scales of about 10 min. They exhibit large temporal variations during both quiet and disturbed times in such time scales. On the other hand, the data also shown that the time scales for the currents to be relatively stable are about 1 min for meso-scale currents and about 10 min for large scale current sheets. These temporal features are obviously associated with dynamic variations of their particle carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of meso-scale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Laboratory studies of magnetized collisionless flows and shocks using accelerated plasmoids
NASA Astrophysics Data System (ADS)
Weber, T. E.; Smith, R. J.; Hsu, S. C.
2015-11-01
Magnetized collisionless shocks are thought to play a dominant role in the overall partition of energy throughout the universe, but have historically proven difficult to create in the laboratory. The Magnetized Shock Experiment (MSX) at LANL creates conditions similar to those found in both space and astrophysical shocks by accelerating hot (100s of eV during translation) dense (1022 - 1023 m-3) Field Reversed Configuration (FRC) plasmoids to high velocities (100s of km/s); resulting in β ~ 1, collisionless plasma flows with sonic and Alfvén Mach numbers of ~10. The FRC subsequently impacts a static target such as a strong parallel or anti-parallel (reconnection-wise) magnetic mirror, a solid obstacle, or neutral gas cloud to create shocks with characteristic length and time scales that are both large enough to observe yet small enough to fit within the experiment. This enables study of the complex interplay of kinetic and fluid processes that mediate cosmic shocks and can generate non-thermal distributions, produce density and magnetic field enhancements much greater than predicted by fluid theory, and accelerate particles. An overview of the experimental capabilities of MSX will be presented, including diagnostics, selected recent results, and future directions. Supported by the DOE Office of Fusion Energy Sciences under contract DE-AC52-06NA25369.
Overview and recent results of the Magnetized Shock Experiment (MSX)
NASA Astrophysics Data System (ADS)
Weber, T. E.; Smith, R. J.; Hsu, S. C.; Omelchenko, Y.
2015-11-01
Recent machine and diagnostics upgrades to the Magnetized Shock Experiment (MSX) at LANL have enabled unprecedented access to the physical processes arising from stagnating magnetized (β ~ 1), collisionless, highly supersonic (M ,MA ~ 10) flows, similar in dimensionless parameters to those found in both space and astrophysical shocks. Hot (100s of eV during translation), dense (1022 - 1023 m-3) Field Reversed Configuration (FRC) plasmoids are accelerated to high velocities (100s of km/s) and subsequently impact against a static target such as a strong parallel or anti-parallel (reconnection-wise) magnetic mirror, a solid obstacle, or neutral gas cloud to recreate the physics of interest with characteristic length and time scales that are both large enough to observe yet small enough to fit within the experiment. Long-lived (>50 μs) stagnated plasmas with density enhancement much greater than predicted by fluid theory (>4x) are observed, accompanied by discontinuous plasma structures indicating shocks and jetting (visible emission and interferometry) and copious >1 keV x-ray emission. An overview of the experimental program will be presented, including machine design and capabilities, diagnostics, and an examination of the physical processes that occur during stagnation against a variety of targets. Supported by the DOE Office of Fusion Energy Sciences under contract DE-AC52-06NA25369.
Grid Computing Environment using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Alanis, Fransisco; Mahmood, Akhtar
2003-10-01
Custom-made Beowulf clusters using PCs are currently replacing expensive supercomputers to carry out complex scientific computations. At the University of Texas - Pan American, we built a 8 Gflops Beowulf Cluster for doing HEP research using RedHat Linux 7.3 and the LAM-MPI middleware. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes that were compiled in C on the cluster using the LAM-XMPI graphics user environment. We will demonstrate a "simple" prototype grid environment, where we will submit and run parallel jobs remotely across multiple cluster nodes over the internet from the presentation room at Texas Tech. University. The Sphinx Beowulf Cluster will be used for monte-carlo grid test-bed studies for the LHC-ATLAS high energy physics experiment. Grid is a new IT concept for the next generation of the "Super Internet" for high-performance computing. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
NASA Technical Reports Server (NTRS)
Deveikis, W. D.
1983-01-01
External and internal pressure and cold-wall heating-rate distributions were obtained in hypersonic flow on a full-scale heat-sink representation of the space shuttle orbiter wing-elevon-cove configuration in an effort to define effects of flow separation on cove aerothermal environment as a function of cove seal leak area, ramp angle, and free-stream unit Reynolds number. Average free-stream Mach number from all tests was 6.9; average total temperature from all tests was 3360 R; free-stream dynamic pressure ranged from about 2 to 9 psi; and wing angle of attack was 5 deg (flow compression). For transitional and turbulent flow separation, increasing cove leakage progressively increased heating rates in the cove. When ingested mass flow was sufficient to force large reductions in extent of separation, increasing cove leakage reduced heating rates in the cove to those for laminar attached flow. Cove heating-rate distributions calculated with a method that assumed laminar developing channel flow agreed with experimentally obtained distributions within root-mean-square differences that varied between 11 and 36 percent where cove walls were parallel for leak areas of 50 and 100 percent.
Magnetic intermittency of solar wind turbulence in the dissipation range
NASA Astrophysics Data System (ADS)
Pei, Zhongtian; He, Jiansen; Tu, Chuanyi; Marsch, Eckart; Wang, Linghua
2016-04-01
The feature, nature, and fate of intermittency in the dissipation range are an interesting topic in the solar wind turbulence. We calculate the distribution of flatness for the magnetic field fluctuations as a functionof angle and scale. The flatness distribution shows a "butterfly" pattern, with two wings located at angles parallel/anti-parallel to local mean magnetic field direction and main body located at angles perpendicular to local B0. This "butterfly" pattern illustrates that the flatness profile in (anti-) parallel direction approaches to the maximum value at larger scale and drops faster than that in perpendicular direction. The contours for probability distribution functions at different scales illustrate a "vase" pattern, more clear in parallel direction, which confirms the scale-variation of flatness and indicates the intermittency generation and dissipation. The angular distribution of structure function in the dissipation range shows an anisotropic pattern. The quasi-mono-fractal scaling of structure function in the dissipation range is also illustrated and investigated with the mathematical model for inhomogeneous cascading (extended p-model). Different from the inertial range, the extended p-model for the dissipation range results in approximate uniform fragmentation measure. However, more complete mathematicaland physical model involving both non-uniform cascading and dissipation is needed. The nature of intermittency may be strong structures or large amplitude fluctuations, which may be tested with magnetic helicity. In one case study, we find the heating effect in terms of entropy for large amplitude fluctuations seems to be more obvious than strong structures.
al3c: high-performance software for parameter inference using Approximate Bayesian Computation.
Stram, Alexander H; Marjoram, Paul; Chen, Gary K
2015-11-01
The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments. We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead. al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c. astram@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Lohmann, U.; Jahns, J.; Limmer, S.; Fey, D.
2011-01-01
We consider the implementation of a dynamic crossbar interconnect using planar-integrated free-space optics (PIFSO) and a digital mirror-device™ (DMD). Because of the 3D nature of free-space optics, this approach is able to solve geometrical problems with crossings of the signal paths that occur in waveguide optical and electrical interconnection, especially for large number of connections. The DMD device allows one to route the signals dynamically. Due to the large number of individual mirror elements in the DMD, different optical path configurations are possible, thus offering the chance for optimizing the network configuration. The optimization is achieved by using an evolutionary algorithm for finding best values for a skewless parallel interconnection. Here, we present results and experimental examples for the use of the PIFSO/DMD-setup.
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
A comparison of parallel and diverging screw angles in the stability of locked plate constructs.
Wähnert, D; Windolf, M; Brianza, S; Rothstock, S; Radtke, R; Brighenti, V; Schwieger, K
2011-09-01
We investigated the static and cyclical strength of parallel and angulated locking plate screws using rigid polyurethane foam (0.32 g/cm(3)) and bovine cancellous bone blocks. Custom-made stainless steel plates with two conically threaded screw holes with different angulations (parallel, 10° and 20° divergent) and 5 mm self-tapping locking screws underwent pull-out and cyclical pull and bending tests. The bovine cancellous blocks were only subjected to static pull-out testing. We also performed finite element analysis for the static pull-out test of the parallel and 20° configurations. In both the foam model and the bovine cancellous bone we found the significantly highest pull-out force for the parallel constructs. In the finite element analysis there was a 47% more damage in the 20° divergent constructs than in the parallel configuration. Under cyclical loading, the mean number of cycles to failure was significantly higher for the parallel group, followed by the 10° and 20° divergent configurations. In our laboratory setting we clearly showed the biomechanical disadvantage of a diverging locking screw angle under static and cyclical loading.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
A scalable PC-based parallel computer for lattice QCD
NASA Astrophysics Data System (ADS)
Fodor, Z.; Katz, S. D.; Pappa, G.
2003-05-01
A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eo¨tvo¨s Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop.
Force sharing in high-power parallel servo-actuators
NASA Technical Reports Server (NTRS)
Neal, T. P.
1974-01-01
The various existing force sharing schemes were examined by conducting a literature survey. A list of potentially applicable concepts was compiled from this survey, and a brief analysis was then made of each concept, which resulted in two competing schemes being selected for in-depth evaluation. A functional design of the equalization logic for the two schemes was undertaken and specific space shuttle application was chosen for experimental evaluation. The application was scaled down so that existing hardware could be utilized. Next, an analog computer study was conducted to evaluate the more important characteristics of the two competing force sharing schemes. On the basis of the computers study, a final configuration was selected. A load simulator was then designed to evaluate this configuration on actual hardware.
The International Conference on Vector and Parallel Computing (2nd)
1989-01-17
Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
KINETIC ALFVÉN WAVE GENERATION BY LARGE-SCALE PHASE MIXING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vásconez, C. L.; Pucci, F.; Valentini, F.
One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length d{sub p} may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to d{sub p} and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the rolemore » of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained.« less
Grid-Enabled High Energy Physics Research using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Mahmood, Akhtar
2005-04-01
At Edinboro University of Pennsylvania, we have built a 8-node 25 Gflops Beowulf Cluster with 2.5 TB of disk storage space to carry out grid-enabled, data-intensive high energy physics research for the ATLAS experiment via Grid3. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes. Once fully functional, the Cluster will be part of Grid3[www.ivdgl.org/grid3]. The current ATLAS simulation grid application, models the entire physical processes from the proton anti-proton collisions and detector's response to the collision debri through the complete reconstruction of the event from analyses of these responses. The end result is a detailed set of data that simulates the real physical collision event inside a particle detector. Grid is the new IT infrastructure for the 21^st century science -- a new computing paradigm that is poised to transform the practice of large-scale data-intensive research in science and engineering. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj
2012-07-03
We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Computational Issues in Damping Identification for Large Scale Problems
NASA Technical Reports Server (NTRS)
Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.
1997-01-01
Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
Scalable parallel distance field construction for large-scale applications
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less
Scalable Parallel Distance Field Construction for Large-Scale Applications.
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
NASA Technical Reports Server (NTRS)
Eckert, W. T.; Maki, R. L.
1973-01-01
The low-speed characteristics of a large-scale model of the U. S. Navy/Grumman F-14A aircraft were studied in tests conducted in the Ames Research Center 40- by 80-Foot Wind Tunnel. The primary purpose of the program was the determination of lift and stability levels and landing approach attitude of the aircraft in its high-lift configuration. Tests were conducted at wing angles of attack between minus 2 deg and 30 deg with zero yaw. Data were taken at Reynolds numbers ranging from 3.48 million to 9.64 million based on a wing mean aerodynamic chord of 7.36 ft. The model configuration was changed as required to show the effects of glove slat, wing slat leading-edge radius, cold flow ducting, flap deflection, direct lift control (spoilers), horizontal tail, speed brake, landing gear and missiles.
NASA Technical Reports Server (NTRS)
Eckert, W. T.; Maki, R. L.
1973-01-01
The low-speed characteristics of a large-scale model of the F-14A aircraft were studied in tests conducted in the Ames Research Center 40- by 80-Foot Wind Tunnel. The primary purpose of the present tests was the determination of lateral-directional stability levels and control effectiveness of the aircraft in its high-lift configuration. Tests were conducted at wing angles of attack between minus 2 deg and 30 deg and with sideslip angles between minus 12 deg and 12 deg. Data were taken at a Reynolds number of 8.0 million based on a wing mean aerodynamic chord of 2.24 m (7.36 ft). The model configuration was changed as required to show the effects of direct lift control (spoilers) at yaw, yaw angle with speed brake deflected, and various amounts and combinations of roll control.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Flap noise measurements for STOL configurations using external upper surface blowing
NASA Technical Reports Server (NTRS)
Dorsch, R. G.; Reshotko, M.; Olsen, W. A.
1972-01-01
Screening tests of upper surface blowing on externally blown flaps configurations were conducted. Noise and turning effectiveness data were obtained with small-scale, engine-over-the-wing models. One large model was tested to determine scale effects. Nozzle types included circular, slot, D-shaped, and multilobed. Tests were made with and without flow attachment devices. For STOL applications the particular multilobed mixer and the D-shaped nozzles tested were found to offer little or no noise advantage over the round convergent nozzle. High aspect ratio slot nozzles provided the quietest configurations. In general, upper surface blowing was quieter than lower surface blowing for equivalent EBF models.
FPGA-based protein sequence alignment : A review
NASA Astrophysics Data System (ADS)
Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi
2017-11-01
Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.
Parallel Computational Protein Design.
Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang
2017-01-01
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations
NASA Astrophysics Data System (ADS)
Hunsaker, Isaac L.
Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Scaling Optimization of the SIESTA MHD Code
NASA Astrophysics Data System (ADS)
Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan
2013-10-01
SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS
NASA Astrophysics Data System (ADS)
Pavia, F.; Curtin, W. A.
2015-07-01
Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 Å from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of ∼50 nm using fewer than 1 000 000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Aft-End Flow of a Large-Scale Lifting Body During Free-Flight Tests
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Fisher, David F.
2006-01-01
Free-flight tests of a large-scale lifting-body configuration, the X-38 aircraft, were conducted using tufts to characterize the flow on the aft end, specifically in the inboard region of the vertical fins. Pressure data was collected on the fins and base. Flow direction and movement were correlated with surface pressure and flight condition. The X-38 was conceived to be a rescue vehicle for the International Space Station. The vehicle shape was derived from the U.S. Air Force X-24 lifting body. Free-flight tests of the X-38 configuration were conducted at the NASA Dryden Flight Research Center at Edwards Air Force Base, California from 1997 to 2001.
Development of Supersonic Vehicle for Demonstration of a Precooled Turbojet Engine
NASA Astrophysics Data System (ADS)
Sawai, Shujiro; Fujita, Kazuhisa; Kobayashi, Hiroaki; Sakai, Shin'ichiro; Bando, Nobutaka; Kadooka, Shouhei; Tsuboi, Nobuyuki; Miyaji, Koji; Uchiyama, Taku; Hashimoto, Tatsuaki
JAXA is developing Mach 5 hypersonic turbojet engine technology that can be applied in a future hypersonic transport. Now, Jet Engine Technology Research Center of JAXA conducts the experimental study using a 1 / 10 scale-model engine. In parallel to engine development activities, a new supersonic flight-testing vehicle for the hypersonic turbojet engine is under development since 2004. In this paper, the system configuration of the flight-testing vehicle is outlined and development status is reported.
Feasibility study of Thermal Electric Generator Configurations as Renewable Energy Sources
NASA Astrophysics Data System (ADS)
Akmal Johar, Muhammad; Yahaya, Zulkarnain; Faizan Marwah, Omar Mohd; Jamaludin, Wan Akashah Wan; Najib Ribuan, Mohamed
2017-10-01
Thermoelectric Generator is a solid state device that able to convert thermal energy into electrical energy via temperature differences. The technology is based on Seebeck effect that was discovered in year 1821, however till now there is no real application to exploit this capability in mass scale. This research will report the performance analysis of TEG module in controlled environment of lab scale model. National Instrument equipment and Labview software has been choosen and developed to measure the TEG module in various configurations. Based on the experiment result, an additional passive cooling effort has produced a better ΔT by 7°C. The optimal electrical loading of single TEG is recorded at 200Ω. As for circuit connections, series connection has shown superior power output when compared to parallel connection or single TEG. A series connection of two TEGs has produced power output of 416.82μW when compared to other type connections that only produced around 100μW.
Kim, Steven; Heller, James; Iqbal, Zohora; Kant, Rishi; Kim, Eun Jung; Durack, Jeremy; Saeed, Maythem; Do, Loi; Hetts, Steven; Wilson, Mark; Brakeman, Paul; Fissell, William H.; Roy, Shuvo
2015-01-01
Silicon nanopore membranes (SNM) with compact geometry and uniform pore size distribution have demonstrated a remarkable capacity for hemofiltration. These advantages could potentially be used for hemodialysis. Here we present an initial evaluation of the SNM’s mechanical robustness, diffusive clearance, and hemocompatibility in a parallel plate configuration. Mechanical robustness of the SNM was demonstrated by exposing membranes to high flows (200ml/min) and pressures (1,448mmHg). Diffusive clearance was performed in an albumin solution and whole blood with blood and dialysate flow rates of 25ml/min. Hemocompatibility was evaluated using scanning electron microscopy and immunohistochemistry after 4-hours in an extra-corporeal porcine model. The pressure drop across the flow cell was 4.6mmHg at 200ml/min. Mechanical testing showed that SNM could withstand up to 775.7mmHg without fracture. Urea clearance did not show an appreciable decline in blood versus albumin solution. Extra-corporeal studies showed blood was successfully driven via the arterial-venous pressure differential without thrombus formation. Bare silicon showed increased cell adhesion with a 4.1 fold increase and 1.8 fold increase over polyethylene-glycol (PEG)-coated surfaces for tissue plasminogen factor (t-PA) and platelet adhesion (CD-41), respectively. These initial results warrant further design and development of a fully scaled SNM-based parallel plate dialyzer for renal replacement therapy. PMID:26692401
Millette, Katie L; Keyghobadi, Nusha
2015-01-01
Despite strong interest in understanding how habitat spatial structure shapes the genetics of populations, the relative importance of habitat amount and configuration for patterns of genetic differentiation remains largely unexplored in empirical systems. In this study, we evaluate the relative influence of, and interactions among, the amount of habitat and aspects of its spatial configuration on genetic differentiation in the pitcher plant midge, Metriocnemus knabi. Larvae of this species are found exclusively within the water-filled leaves of pitcher plants (Sarracenia purpurea) in a system that is naturally patchy at multiple spatial scales (i.e., leaf, plant, cluster, peatland). Using generalized linear mixed models and multimodel inference, we estimated effects of the amount of habitat, patch size, interpatch distance, and patch isolation, measured at different spatial scales, on genetic differentiation (FST) among larval samples from leaves within plants, plants within clusters, and clusters within peatlands. Among leaves and plants, genetic differentiation appears to be driven by female oviposition behaviors and is influenced by habitat isolation at a broad (peatland) scale. Among clusters, gene flow is spatially restricted and aspects of both the amount of habitat and configuration at the focal scale are important, as is their interaction. Our results suggest that both habitat amount and configuration can be important determinants of genetic structure and that their relative influence is scale dependent. PMID:25628865
Millette, Katie L; Keyghobadi, Nusha
2015-01-01
Despite strong interest in understanding how habitat spatial structure shapes the genetics of populations, the relative importance of habitat amount and configuration for patterns of genetic differentiation remains largely unexplored in empirical systems. In this study, we evaluate the relative influence of, and interactions among, the amount of habitat and aspects of its spatial configuration on genetic differentiation in the pitcher plant midge, Metriocnemus knabi. Larvae of this species are found exclusively within the water-filled leaves of pitcher plants (Sarracenia purpurea) in a system that is naturally patchy at multiple spatial scales (i.e., leaf, plant, cluster, peatland). Using generalized linear mixed models and multimodel inference, we estimated effects of the amount of habitat, patch size, interpatch distance, and patch isolation, measured at different spatial scales, on genetic differentiation (F ST) among larval samples from leaves within plants, plants within clusters, and clusters within peatlands. Among leaves and plants, genetic differentiation appears to be driven by female oviposition behaviors and is influenced by habitat isolation at a broad (peatland) scale. Among clusters, gene flow is spatially restricted and aspects of both the amount of habitat and configuration at the focal scale are important, as is their interaction. Our results suggest that both habitat amount and configuration can be important determinants of genetic structure and that their relative influence is scale dependent.
On the nature of the NAA diffusion attenuated MR signal in the central nervous system.
Kroenke, Christopher D; Ackerman, Joseph J H; Yablonskiy, Dmitriy A
2004-11-01
In the brain, on a macroscopic scale, diffusion of the intraneuronal constituent N-acetyl-L-aspartate (NAA) appears to be isotropic. In contrast, on a microscopic scale, NAA diffusion is likely highly anisotropic, with displacements perpendicular to neuronal fibers being markedly hindered, and parallel displacements less so. In this report we first substantiate that local anisotropy influences NAA diffusion in vivo by observing differing diffusivities parallel and perpendicular to human corpus callosum axonal fibers. We then extend our measurements to large voxels within rat brains. As expected, the macroscopic apparent diffusion coefficient (ADC) of NAA is practically isotropic due to averaging of the numerous and diverse fiber orientations. We demonstrate that the substantially non-monoexponential diffusion-mediated MR signal decay vs. b value can be quantitatively explained by a theoretical model of NAA confined to an ensemble of differently oriented neuronal fibers. On the microscopic scale, NAA diffusion is found to be strongly anisotropic, with displacements occurring almost exclusively parallel to the local fiber axis. This parallel diffusivity, ADCparallel, is 0.36 +/- 0.01 microm2/ms, and ADCperpendicular is essentially zero. From ADCparallel the apparent viscosity of the neuron cytoplasm is estimated to be twice as large as that of a temperature-matched dilute aqueous solution. (c) 2004 Wiley-Liss, Inc.
Hybrid Wing Body Configuration Scaling Study
NASA Technical Reports Server (NTRS)
Nickol, Craig L.
2012-01-01
The Hybrid Wing Body (HWB) configuration is a subsonic transport aircraft concept with the potential to simultaneously reduce fuel burn, noise and emissions compared to conventional concepts. Initial studies focused on very large applications with capacities for up to 800 passengers. More recent studies have focused on the large, twin-aisle class with passenger capacities in the 300-450 range. Efficiently scaling this concept down to the single aisle or smaller size is challenging due to geometric constraints, potentially reducing the desirability of this concept for applications in the 100-200 passenger capacity range or less. In order to quantify this scaling challenge, five advanced conventional (tube-and-wing layout) concepts were developed, along with equivalent (payload/range/technology) HWB concepts, and their fuel burn performance compared. The comparison showed that the HWB concepts have fuel burn advantages over advanced tube-and-wing concepts in the larger payload/range classes (roughly 767-sized and larger). Although noise performance was not quantified in this study, the HWB concept has distinct noise advantages over the conventional tube-and-wing configuration due to the inherent noise shielding features of the HWB. NASA s Environmentally Responsible Aviation (ERA) project will continue to investigate advanced configurations, such as the HWB, due to their potential to simultaneously reduce fuel burn, noise and emissions.
Relaxation in two dimensions and the 'sinh-Poisson' equation
NASA Technical Reports Server (NTRS)
Montgomery, D.; Matthaeus, W. H.; Stribling, W. T.; Martinez, D.; Oughton, S.
1992-01-01
Long-time states of a turbulent, decaying, two-dimensional, Navier-Stokes flow are shown numerically to relax toward maximum-entropy configurations, as defined by the "sinh-Poisson" equation. The large-scale Reynolds number is about 14,000, the spatial resolution is (512)-squared, the boundary conditions are spatially periodic, and the evolution takes place over nearly 400 large-scale eddy-turnover times.
Programming Probabilistic Structural Analysis for Parallel Processing Computer
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.
1991-01-01
The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Parallel stitching of 2D materials
Ling, Xi; Wu, Lijun; Lin, Yuxuan; ...
2016-01-27
Diverse parallel stitched 2D heterostructures, including metal–semiconductor, semiconductor–semiconductor, and insulator–semiconductor, are synthesized directly through selective “sowing” of aromatic molecules as the seeds in the chemical vapor deposition (CVD) method. Lastly, the methodology enables the large-scale fabrication of lateral heterostructures, which offers tremendous potential for its application in integrated circuits.
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
NASA Technical Reports Server (NTRS)
Liu, Zhuo; Wang, Bin; Wang, Teng; Tian, Yuan; Xu, Cong; Wang, Yandong; Yu, Weikuan; Cruz, Carlos A.; Zhou, Shujia; Clune, Tom;
2013-01-01
Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation.
NASA Astrophysics Data System (ADS)
Yang, Liping; Zhang, Lei; He, Jiansen; Tu, Chuanyi; Li, Shengtai; Wang, Xin; Wang, Linghua
2018-03-01
Multi-order structure functions in the solar wind are reported to display a monofractal scaling when sampled parallel to the local magnetic field and a multifractal scaling when measured perpendicularly. Whether and to what extent will the scaling anisotropy be weakened by the enhancement of turbulence amplitude relative to the background magnetic strength? In this study, based on two runs of the magnetohydrodynamic (MHD) turbulence simulation with different relative levels of turbulence amplitude, we investigate and compare the scaling of multi-order magnetic structure functions and magnetic probability distribution functions (PDFs) as well as their dependence on the direction of the local field. The numerical results show that for the case of large-amplitude MHD turbulence, the multi-order structure functions display a multifractal scaling at all angles to the local magnetic field, with PDFs deviating significantly from the Gaussian distribution and a flatness larger than 3 at all angles. In contrast, for the case of small-amplitude MHD turbulence, the multi-order structure functions and PDFs have different features in the quasi-parallel and quasi-perpendicular directions: a monofractal scaling and Gaussian-like distribution in the former, and a conversion of a monofractal scaling and Gaussian-like distribution into a multifractal scaling and non-Gaussian tail distribution in the latter. These results hint that when intermittencies are abundant and intense, the multifractal scaling in the structure functions can appear even if it is in the quasi-parallel direction; otherwise, the monofractal scaling in the structure functions remains even if it is in the quasi-perpendicular direction.
Manifestations of dynamo driven large-scale magnetic field in accretion disks of compact objects
NASA Technical Reports Server (NTRS)
Chagelishvili, G. D.; Chanishvili, R. G.; Lominadze, J. G.; Sokhadze, Z. A.
1991-01-01
A turbulent dynamo nonlinear theory of turbulence was developed that shows that in the compact objects of accretion disks, the generated large-scale magnetic field (when the generation takes place) has a practically toroidal configuration. Its energy density can be much higher than turbulent pulsations energy density, and it becomes comparable with the thermal energy density of the medium. On this basis, the manifestations to which the large-scale magnetic field can lead at the accretion onto black holes and gravimagnetic rotators, respectively, are presented.
Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C
2010-09-21
We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Wing force and surface pressure data from a hover test of a 0.658-scale V-22 rotor and wing
NASA Technical Reports Server (NTRS)
Felker, Fort F.; Shinoda, Patrick R.; Heffernan, Ruth M.; Sheehy, Hugh F.
1990-01-01
A hover test of a 0.658-scale V-22 rotor and wing was conducted in the 40 x 80 foot wind tunnel at Ames Research Center. The principal objective of the test was to measure the surface pressures and total download on a large scale V-22 wing in hover. The test configuration consisted of a single rotor and semispan wing on independent balance systems. A large image plane was used to represent the aircraft plane of symmetry. Wing flap angles ranging from 45 to 90 degrees were examined. Data were acquired for both directions of the rotor rotation relative to the wing. Steady and unsteady wing surface pressures, total wing forces, and rotor performance data are presented for all of the configurations that were tested.
HVI Ballistic Performance Characterization of Non-Parallel Walls
NASA Technical Reports Server (NTRS)
Bohl, William; Miller, Joshua; Christiansen, Eric
2012-01-01
The Double-Wall, "Whipple" Shield [1] has been the subject of many hypervelocity impact studies and has proven to be an effective shield system for Micro-Meteoroid and Orbital Debris (MMOD) impacts for spacecraft. The US modules of the International Space Station (ISS), with their "bumper shields" offset from their pressure holding rear walls provide good examples of effective on-orbit use of the double wall shield. The concentric cylinder shield configuration with its large radius of curvature relative to separation distance is easily and effectively represented for testing and analysis as a system of two parallel plates. The parallel plate double wall configuration has been heavily tested and characterized for shield performance for normal and oblique impacts for the ISS and other programs. The double wall shield and principally similar Stuffed Whipple Shield are very common shield types for MMOD protection. However, in some locations with many spacecraft designs, the rear wall cannot be modeled as being parallel or concentric with the outer bumper wall. As represented in Figure 1, there is an included angle between the two walls. And, with a cylindrical outer wall, the effective included angle constantly changes. This complicates assessment of critical spacecraft components located within outer spacecraft walls when using software tools such as NASA's BumperII. In addition, the validity of the risk assessment comes into question when using the standard double wall shield equations, especially since verification testing of every set of double wall included angles is impossible.
Combining points and lines in rectifying satellite images
NASA Astrophysics Data System (ADS)
Elaksher, Ahmed F.
2017-09-01
The quick advance in remote sensing technologies established the potential to gather accurate and reliable information about the Earth surface using high resolution satellite images. Remote sensing satellite images of less than one-meter pixel size are currently used in large-scale mapping. Rigorous photogrammetric equations are usually used to describe the relationship between the image coordinates and ground coordinates. These equations require the knowledge of the exterior and interior orientation parameters of the image that might not be available. On the other hand, the parallel projection transformation could be used to represent the mathematical relationship between the image-space and objectspace coordinate systems and provides the required accuracy for large-scale mapping using fewer ground control features. This article investigates the differences between point-based and line-based parallel projection transformation models in rectifying satellite images with different resolutions. The point-based parallel projection transformation model and its extended form are presented and the corresponding line-based forms are developed. Results showed that the RMS computed using the point- or line-based transformation models are equivalent and satisfy the requirement for large-scale mapping. The differences between the transformation parameters computed using the point- and line-based transformation models are insignificant. The results showed high correlation between the differences in the ground elevation and the RMS.
Ascent control studies of the 049 and ATP parallel burn solid rocket motor shuttle configurations
NASA Technical Reports Server (NTRS)
Ryan, R. S.; Mowery, D. K.; Hammer, M.; Weisler, A. C.
1972-01-01
The control authority approach is discussed as a major problem of the parallel burn soil shuttle configuration due to the many resulting system impacts regardless of the approach. The major trade studies and their results, which led to the recommendation of an SRB TVC control authority approach are presented.
Strut and wall interference on jet-induced ground effects of a STOVL aircraft in hover
NASA Technical Reports Server (NTRS)
Kristy, Michael H.
1995-01-01
A small scale ground effect test rig was used to study the ground plane flow field generated by a STOVL aircraft in hover. The objective of the research was to support NASA-Ames Research Center planning for the Large Scale Powered Model (LSPM) test for the ARPA-sponsored ASTOVL program. Specifically, small scale oil flow visualization studies were conducted to make a relative assessment of the aerodynamic interference of a proposed strut configuration and a wall configuration on the ground plane stagnation line. A simplified flat plate model representative of a generic jet-powered STOVL aircraft was used to simulate the LSPM. Cold air jets were used to simulate both the lift fan and the twin rear engines. Nozzle Pressure Ratios were used that closely represented those used on the LSPM tests. The flow visualization data clearly identified a shift in the stagnation line location for both the strut and the wall configuration. Considering the experimental uncertainty, it was concluded that either the strut configuration o r the wall configuration caused only a minor aerodynamic interference.
Morag, Ahiud; Becker, James Y; Jelinek, Raz
2017-07-10
Microsupercapacitors are touted as one of the promising "next frontiers" in energy-storage research and applications. Despite their potential, significant challenges still exist in terms of physical properties and electrochemical performance, particularly attaining high energy density, stability, ease of synthesis, and feasibility of large-scale production. We present new freestanding microporous electrodes comprising self-assembled scaffold of gold and reduced graphene oxide (rGO) nanowires coated with MnO 2 . The electrodes exhibited excellent electrochemical characteristics, particularly superior high areal capacitance. Moreover, the freestanding Au/rGO scaffold also served as the current collector, obviating the need for an additional electrode support required in most reported supercapacitors, thus enabling low volume and weight devices with a high overall device specific energy. Stacked symmetrical solid-state supercapacitors were fabricated using the Au/rGO/MnO 2 electrodes in parallel configurations showing the advantage of using freestanding electrodes in the fabrication of low-volume devices. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Vectorial finite elements for solving the radiative transfer equation
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.
2018-06-01
The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
Parallel group independent component analysis for massive fMRI data sets.
Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S
2017-01-01
Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
NASA Astrophysics Data System (ADS)
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; Wilcox, R. S.; Anderson, D. T.
2018-05-01
The radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C+6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Indications are that the radial electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
2016-07-26
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization
NASA Technical Reports Server (NTRS)
Jones, James Patton; Nitzberg, Bill
1999-01-01
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FIFO first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet, utilization was affected little. In particular these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
Accelerating Full Configuration Interaction Calculations for Nuclear Structure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Chao; Sternberg, Philip; Maris, Pieter
2008-04-14
One of the emerging computational approaches in nuclear physics is the full configuration interaction (FCI) method for solving the many-body nuclear Hamiltonian in a sufficiently large single-particle basis space to obtain exact answers - either directly or by extrapolation. The lowest eigenvalues and correspondingeigenvectors for very large, sparse and unstructured nuclear Hamiltonian matrices are obtained and used to evaluate additional experimental quantities. These matrices pose a significant challenge to the design and implementation of efficient and scalable algorithms for obtaining solutions on massively parallel computer systems. In this paper, we describe the computational strategies employed in a state-of-the-art FCI codemore » MFDn (Many Fermion Dynamics - nuclear) as well as techniques we recently developed to enhance the computational efficiency of MFDn. We will demonstrate the current capability of MFDn and report the latest performance improvement we have achieved. We will also outline our future research directions.« less
Nongyrotropic Electrons in Guide Field Reconnection
NASA Technical Reports Server (NTRS)
Wendel, D. E.; Hesse, M.; Bessho, N.; Adrian, M. L.; Kuznetsova, M.
2016-01-01
We apply a scalar measure of nongyrotropy to the electron pressure tensor in a 2D particle-in-cell simulation of guide field reconnection and assess the corresponding electron distributions and the forces that account for the nongyrotropy. The scalar measure reveals that the nongyrotropy lies in bands that straddle the electron diffusion region and the separatrices, in the same regions where there are parallel electric fields. Analysis of electron distributions and fields shows that the nongyrotropy along the inflow and outflow separatrices emerges as a result of multiple populations of electrons influenced differently by large and small-scale parallel electric fields and by gradients in the electric field. The relevant parallel electric fields include large-scale potential ramps emanating from the x-line and sub-ion inertial scale bipolar electron holes. Gradients in the perpendicular electric field modify electrons differently depending on their phase, thus producing nongyrotropy. Magnetic flux violation occurs along portions of the separatrices that coincide with the parallel electric fields. An inductive electric field in the electron EB drift frame thus develops, which has the effect of enhancing nongyrotropies already produced by other mechanisms and under certain conditions producing their own nongyrotropy. Particle tracing of electrons from nongyrotropic populations along the inflows and outflows shows that the striated structure of nongyrotropy corresponds to electrons arriving from different source regions. We also show that the relevant parallel electric fields receive important contributions not only from the nongyrotropic portion of the electron pressure tensor but from electron spatial and temporal inertial terms as well.
Photonic content-addressable memory system that uses a parallel-readout optical disk
NASA Astrophysics Data System (ADS)
Krishnamoorthy, Ashok V.; Marchand, Philippe J.; Yayla, Gökçe; Esener, Sadik C.
1995-11-01
We describe a high-performance associative-memory system that can be implemented by means of an optical disk modified for parallel readout and a custom-designed silicon integrated circuit with parallel optical input. The system can achieve associative recall on 128 \\times 128 bit images and also on variable-size subimages. The system's behavior and performance are evaluated on the basis of experimental results on a motionless-head parallel-readout optical-disk system, logic simulations of the very-large-scale integrated chip, and a software emulation of the overall system.
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
pcircle - A Suite of Scalable Parallel File System Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG, FEIYI
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
NASA Technical Reports Server (NTRS)
Johnston, Patrick H.; Juarez, Peter D.
2016-01-01
The Pultruded Rod Stitched Efficient Unitized Structure (PRSEUS) is a structural concept developed by the Boeing Company to address the complex structural design aspects associated with a pressurized hybrid wing body (HWB) aircraft configuration. The HWB has long been a focus of NASA's environmentally responsible aviation (ERA) project, following a building block approach to structures development, culminating with the testing of a nearly full-scale multi-bay box (MBB), representing a segment of the pressurized, non-circular fuselage portion of the HWB. PRSEUS is an integral structural concept wherein skins, frames, stringers and tear straps made of variable number of layers of dry warp-knit carbon-fiber stacks are stitched together, then resin-infused and cured in an out-of-autoclave process. The PRSEUS concept has the potential for reducing the weight and cost and increasing the structural efficiency of transport aircraft structures. A key feature of PRSEUS is the damage-arresting nature of the stitches, which enables the use of fail-safe design principles. During the load testing of the MBB, ultrasonic nondestructive evaluation (NDE) was used to monitor several sites of intentional barely-visible impact damage (BVID) as well as to survey the areas surrounding the failure cracks after final loading to catastrophic failure. The damage-arresting ability of PRSEUS was confirmed by the results of NDE. In parallel with the large-scale structural testing of the MBB, mechanical tests were conducted of the PRSEUS rod-to-overwrap bonds, as measured by pushing the rod axially from a short length of stringer.
Atomistic Picture for the Folding Pathway of a Hybrid-1 Type Human Telomeric DNA G-quadruplex
Bian, Yunqiang; Tan, Cheng; Wang, Jun; Sheng, Yuebiao; Zhang, Jian; Wang, Wei
2014-01-01
In this work we studied the folding process of the hybrid-1 type human telomeric DNA G-quadruplex with solvent and ions explicitly modeled. Enabled by the powerful bias-exchange metadynamics and large-scale conventional molecular dynamic simulations, the free energy landscape of this G-DNA was obtained for the first time and four folding intermediates were identified, including a triplex and a basically formed quadruplex. The simulations also provided atomistic pictures for the structures and cation binding patterns of the intermediates. The results showed that the structure formation and cation binding are cooperative and mutually supporting each other. The syn/anti reorientation dynamics of the intermediates was also investigated. It was found that the nucleotides usually take correct syn/anti configurations when they form native and stable hydrogen bonds with the others, while fluctuating between two configurations when they do not. Misfolded intermediates with wrong syn/anti configurations were observed in the early intermediates but not in the later ones. Based on the simulations, we also discussed the roles of the non-native interactions. Besides, the formation process of the parallel conformation in the first two G-repeats and the associated reversal loop were studied. Based on the above results, we proposed a folding pathway for the hybrid-1 type G-quadruplex with atomistic details, which is new and more complete compared with previous ones. The knowledge gained for this type of G-DNA may provide a general insight for the folding of the other G-quadruplexes. PMID:24722458
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dali, E-mail: wangdali@mail.ahnu.edu.cn; National Laboratory of Solid State Microstructures and Department of Physics, Nanjing University, Nanjing 210093; Jin, Guojun, E-mail: gjin@nju.edu.cn
2013-12-21
We investigate the effect of a vertical electric field on the electron tunneling and magnetoresistance in an AA-stacked graphene bilayer modulated by the double magnetic barriers with parallel or antiparallel configuration. The results show that the electronic transmission properties in the system are sensitive to the magnetic-barrier configuration and the bias voltage between the graphene layers. In particular, it is found that for the antiparallel configuration, within the low energy region, the blocking effect is more obvious compared with the case for the parallel configuration, and even there may exist a transmission spectrum gap which can be arbitrarily tuned bymore » the field-induced interlayer bias voltage. We also demonstrate that the significant discrepancy between the conductance for both parallel and antiparallel configurations would result in a giant tunneling magnetoresistance ratio, and further the maximal magnetoresistance ratio can be strongly modified by the interlayer bias voltage. This leads to the possible realization of high-quality magnetic sensors controlled by a vertical electric field in the AA-stacked graphene bilayer.« less
Two Non Linear Dynamics Plasma Astrophysics Experiments At LANL
NASA Astrophysics Data System (ADS)
Intrator, T.; Weber, T.; Feng, Y.; Sears, J.; Smith, R. J.; Swan, H.; Hutchinson, T.; Boguski, J.; Gao, K.; Chapdelaine, L.; Dunn, J. P.
2013-12-01
Two laboratory experiments at Los Alamos National Laboratory (LANL) have been built to gain access to a wide range of fundamental plasma physics issues germane to astro, space, and fusion plasmas. The over arching theme is magnetized plasma dynamics that include currents, MHD forces and instabilities, sheared flows and shocks, along with creation and annihilation of magnetic field. The Relaxation Scaling Experiment (RSX) creates current sheets and flux ropes that exhibit fully 3D dynamics, that are observed to kink, bounce, merge and reconnect, shred, and reform in complicated ways. We show recent movies from a large detailed data set that describe the 3D magnetic structure and helicity budget of a driven and dissipative system that spontaneously self saturates a kink instability. The Magnetized Shock Experiment (MSX) uses a Field reversed configuration (FRC) that is ejected at high speed and then stagnated onto a stopping mirror field, which drives a collisionless magnetized shock. A plasmoid accelerator will also access super critical shocks at much larger Alfven Mach numbers. Unique features include access to parallel, oblique and perpendicular shocks, in regions much larger than ion gyro radius and inertial length, large magnetic and fluid Reynolds numbers, and volume for turbulence.
Branches of electrostatic turbulence inside solitary plasma structures in the auroral ionosphere
DOE Office of Scientific and Technical Information (OSTI.GOV)
Golovchanskaya, Irina V.; Kozelov, Boris V.; Chernyshov, Alexander A.
2014-08-15
The excitation of electrostatic turbulence inside space-observed solitary structures is a central topic of this exposition. Three representative solitary structures observed in the topside auroral ionosphere as large-amplitude nonlinear signatures in the electric field and magnetic-field-aligned current on the transverse scales of ∼10{sup 2}–10{sup 3} m are evaluated by the theories of electrostatic wave generation in inhomogeneous background configurations. A quantitative analysis shows that the structures are, in general, effective in destabilizing the inhomogeneous energy-density-driven (IEDD) waves, as well as of the ion acoustic waves modified by a shear in the parallel drift of ions. It is demonstrated that the dominatingmore » branch of the electrostatic turbulence is determined by the interplay of various driving sources inside a particular solitary structure. The sources do not generally act in unison, so that their common effect may be inhibiting for excitation of electrostatic waves of a certain type. In the presence of large magnetic-field-aligned current, which is not correlated to the inhomogeneous electric field inside the structure, the ion-acoustic branch becomes dominating. In other cases, the IEDD instability is more central.« less
2004-10-01
MONITORING AGENCY NAME(S) AND ADDRESS(ES) Defense Advanced Research Projects Agency AFRL/IFTC 3701 North Fairfax Drive...Scalable Parallel Libraries for Large-Scale Concurrent Applications," Technical Report UCRL -JC-109251, Lawrence Livermore National Laboratory
Adaptive multi-resolution 3D Hartree-Fock-Bogoliubov solver for nuclear structure
NASA Astrophysics Data System (ADS)
Pei, J. C.; Fann, G. I.; Harrison, R. J.; Nazarewicz, W.; Shi, Yue; Thornton, S.
2014-08-01
Background: Complex many-body systems, such as triaxial and reflection-asymmetric nuclei, weakly bound halo states, cluster configurations, nuclear fragments produced in heavy-ion fusion reactions, cold Fermi gases, and pasta phases in neutron star crust, are all characterized by large sizes and complex topologies in which many geometrical symmetries characteristic of ground-state configurations are broken. A tool of choice to study such complex forms of matter is an adaptive multi-resolution wavelet analysis. This method has generated much excitement since it provides a common framework linking many diversified methodologies across different fields, including signal processing, data compression, harmonic analysis and operator theory, fractals, and quantum field theory. Purpose: To describe complex superfluid many-fermion systems, we introduce an adaptive pseudospectral method for solving self-consistent equations of nuclear density functional theory in three dimensions, without symmetry restrictions. Methods: The numerical method is based on the multi-resolution and computational harmonic analysis techniques with a multi-wavelet basis. The application of state-of-the-art parallel programming techniques include sophisticated object-oriented templates which parse the high-level code into distributed parallel tasks with a multi-thread task queue scheduler for each multi-core node. The internode communications are asynchronous. The algorithm is variational and is capable of solving coupled complex-geometric systems of equations adaptively, with functional and boundary constraints, in a finite spatial domain of very large size, limited by existing parallel computer memory. For smooth functions, user-defined finite precision is guaranteed. Results: The new adaptive multi-resolution Hartree-Fock-Bogoliubov (HFB) solver madness-hfb is benchmarked against a two-dimensional coordinate-space solver hfb-ax that is based on the B-spline technique and a three-dimensional solver hfodd that is based on the harmonic-oscillator basis expansion. Several examples are considered, including the self-consistent HFB problem for spin-polarized trapped cold fermions and the Skyrme-Hartree-Fock (+BCS) problem for triaxial deformed nuclei. Conclusions: The new madness-hfb framework has many attractive features when applied to nuclear and atomic problems involving many-particle superfluid systems. Of particular interest are weakly bound nuclear configurations close to particle drip lines, strongly elongated and dinuclear configurations such as those present in fission and heavy-ion fusion, and exotic pasta phases that appear in neutron star crust.
Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biros, George
Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less
Optimistic barrier synchronization
NASA Technical Reports Server (NTRS)
Nicol, David M.
1992-01-01
Barrier synchronization is fundamental operation in parallel computation. In many contexts, at the point a processor enters a barrier it knows that it has already processed all the work required of it prior to synchronization. The alternative case, when a processor cannot enter a barrier with the assurance that it has already performed all the necessary pre-synchronization computation, is treated. The problem arises when the number of pre-sychronization messages to be received by a processor is unkown, for example, in a parallel discrete simulation or any other computation that is largely driven by an unpredictable exchange of messages. We describe an optimistic O(log sup 2 P) barrier algorithm for such problems, study its performance on a large-scale parallel system, and consider extensions to general associative reductions as well as associative parallel prefix computations.
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Wu, Shijia; Li, Hui; Zhou, Xuechen; Liang, Peng; Zhang, Xiaoyuan; Jiang, Yong; Huang, Xia
2016-07-01
A novel stacked microbial fuel cell (MFC) which had a total volume of 72 L with granular activated carbon (GAC) packed bed electrodes was constructed and verified to present remarkable power generation and COD removal performance due to its advantageous design of stack and electrode configuration. During the fed-batch operation period, a power density of 50.9 ± 1.7 W/m(3) and a COD removal efficiency of 97% were achieved within 48 h. Because of the differences among MFC modules in the stack, reversal current occurred in parallel circuit connection with high external resistances (>100 Ω). This reversal current consequently reduced the electrochemical performance of some MFC modules and led to a lower power density in parallel circuit connection than that in independent circuit connection. While increasing the influent COD concentrations from 200 to 800 mg/L at hydraulic retention time of 1.25 h in continuous operation mode, the power density of stacked MFC increased from 25.6 ± 2.5 to 42.1 ± 1.2 W/m(3) and the COD removal rates increased from 1.3 to 5.2 kg COD/(m(3) d). This study demonstrated that this novel MFC stack configuration coupling with GAC packed bed electrode could be a feasible strategy to effectively scale up MFC systems. Copyright © 2016 Elsevier Ltd. All rights reserved.
Conjugate-Gradient Algorithms For Dynamics Of Manipulators
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheid, Robert E.
1993-01-01
Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Workshop report on large-scale matrix diagonalization methods in chemistry theory institute
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bischof, C.H.; Shepard, R.L.; Huss-Lederman, S.
The Large-Scale Matrix Diagonalization Methods in Chemistry theory institute brought together 41 computational chemists and numerical analysts. The goal was to understand the needs of the computational chemistry community in problems that utilize matrix diagonalization techniques. This was accomplished by reviewing the current state of the art and looking toward future directions in matrix diagonalization techniques. This institute occurred about 20 years after a related meeting of similar size. During those 20 years the Davidson method continued to dominate the problem of finding a few extremal eigenvalues for many computational chemistry problems. Work on non-diagonally dominant and non-Hermitian problems asmore » well as parallel computing has also brought new methods to bear. The changes and similarities in problems and methods over the past two decades offered an interesting viewpoint for the success in this area. One important area covered by the talks was overviews of the source and nature of the chemistry problems. The numerical analysts were uniformly grateful for the efforts to convey a better understanding of the problems and issues faced in computational chemistry. An important outcome was an understanding of the wide range of eigenproblems encountered in computational chemistry. The workshop covered problems involving self- consistent-field (SCF), configuration interaction (CI), intramolecular vibrational relaxation (IVR), and scattering problems. In atomic structure calculations using the Hartree-Fock method (SCF), the symmetric matrices can range from order hundreds to thousands. These matrices often include large clusters of eigenvalues which can be as much as 25% of the spectrum. However, if Cl methods are also used, the matrix size can be between 10{sup 4} and 10{sup 9} where only one or a few extremal eigenvalues and eigenvectors are needed. Working with very large matrices has lead to the development of« less
Reducing neural network training time with parallel processing
NASA Technical Reports Server (NTRS)
Rogers, James L., Jr.; Lamarsh, William J., II
1995-01-01
Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.
Negrete, Alejandro; Kotin, Robert M.
2007-01-01
The conventional methods for producing recombinant adeno-associated virus (rAAV) rely on transient transfection of adherent mammalian cells. To gain acceptance and achieve current good manufacturing process (cGMP) compliance, clinical grade rAAV production process should have the following qualities: simplicity, consistency, cost effectiveness, and scalability. Currently, the only viable method for producing rAAV in large-scale, e.g.≥1016 particles per production run, utilizes Baculovirus Expression Vectors (BEVs) and insect cells suspension cultures. The previously described rAAV production in 40 L culture using a stirred tank bioreactor requires special conditions for implementation and operation not available in all laboratories. Alternatives to producing rAAV in stirred-tank bioreactors are single-use, disposable bioreactors, e.g. Wave™. The disposable bags are purchased pre-sterilized thereby eliminating the need for end-user sterilization and also avoiding cleaning steps between production runs thus facilitating the production process. In this study, rAAV production in stirred tank and Wave™ bioreactors was compared. The working volumes were 10 L and 40 L for the stirred tank bioreactors and 5 L and 20 L for the Wave™ bioreactors. Comparable yields of rAAV, ~2e+13 particles per liter of cell culture were obtained in all volumes and configurations. These results demonstrate that producing rAAV in large scale using BEVs is reproducible, scalable, and independent of the bioreactor configuration. Keywords: adeno-associated vectors; large-scale production; stirred tank bioreactor; wave bioreactor; gene therapy. PMID:17606302
NASA Technical Reports Server (NTRS)
Falarski, M. D.; Koenig, D. G.
1972-01-01
The investigation of the in-ground-effect, longitudinal aerodynamic characteristics of a large scale swept augmentor wing model is presented, using 40 x 80 ft wind tunnel. The investigation was conducted at three ground heights; h/c equals 2.01, 1.61, and 1.34. The induced effect of underwing nacelles, was studied with two powered nacelle configurations. One configuration used four JT-15D turbofans while the other used two J-85 turbojet engines. Two conical nozzles on each J-85 were used to deflect the thrust at angles from 0 to 120 deg. Tests were also performed without nacelles to allow comparison with previous data from ground effect.
User's and test case manual for FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John
1995-01-01
The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
Compact, high energy gas laser
Rockwood, Stephen D.; Stapleton, Robert E.; Stratton, Thomas F.
1976-08-03
An electrically pumped gas laser amplifier unit having a disc-like configuration in which light propagation is radially outward from the axis rather than along the axis. The input optical energy is distributed over a much smaller area than the output optical energy, i.e., the amplified beam, while still preserving the simplicity of parallel electrodes for pumping the laser medium. The system may thus be driven by a comparatively low optical energy input, while at the same time, owing to the large output area, large energies may be extracted while maintaining the energy per unit area below the threshold of gas breakdown.
A platform-based foot pressure/shear sensor
NASA Astrophysics Data System (ADS)
Chang, Chun-Te; Liu, Chao Shih; Soetanto, William; Wang, Wei-Chih
2012-04-01
The proposed research is aimed at developing, fabricating and implementing a flexible fiber optic bend loss sensor for the measurement of plantar pressure and shear stress for diabetic patients. The successful development of the sensor will greatly impact the study of diabetic foot ulcers by allowing clinicians to measure a parameter (namely, shear stress) that has been implicated in ulceration, but heretofore, has not been routinely quantified on high risk patients. A full-scale foot pressure/shear sensor involves a tactile sensor array using intersecting optical waveguides is presented. The basic configuration of the optical sensor systems incorporates a mesh that is comprised of two sets of parallel optical waveguide planes; the planes are configured so the parallel rows of waveguides of the top and bottom planes are perpendicular to each other. The planes are sandwiched together creating one sensing sheet. Two-dimensional information is determined by measuring the loss of light from each of the waveguide to map the overall pressure distribution. The shifting of the layers relative to each other allows determination of the shear stress in the plane of the sensor. This paper presents latest development and improvement in the sensors design. Fabrication and results from the latest tests will be described.
Ground Penetrating Radar Survey at Yoros Fortesss,Istanbul
NASA Astrophysics Data System (ADS)
Kucukdemirci, M.; Yalçın, A. B.
2016-12-01
Geophysical methods are effective tool to detect the archaeological remains and materials, which were hidden under the ground. One of the most frequently used methods for archaeological prospection is Ground Penetrating Radar (GPR). This paper illustrates the small scale GPR survey to determine the buried archaeological features around the Yoros Fortress, located on shores of the Bosporus strait in Istanbul, during the archaeological excavations. The survey was carried out with a GSSI SIR 3000 system, using 400 Mhz center frequency bistatic antenna with the configuration of 16 bits dynamic range and 512 samples per scan. The data were collected along parallel profiles with an interval of 0.50 meters with zigzag profile configuration on the survey grids. The GPR data were processed by GPR-Slice V.7 (Ground Penetrating Radar Imaging Software). As a result, in the first shallow depths, some scattered anomalies were detected. These can be related to a small portion of archaeological ruins close to the surface. In the deeper levels, the geometry of the anomalies related to the possible archaeological ruins, looks clearer. Two horizontal and parallel anomalies were detected, with the direction NS in the depth of 1.45 meters, possibly related to the ancient channels.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Le Roux, J. A.
Earlier work based on nonlinear guiding center (NLGC) theory suggested that perpendicular cosmic-ray transport is diffusive when cosmic rays encounter random three-dimensional magnetohydrodynamic turbulence dominated by uniform two-dimensional (2D) turbulence with a minor uniform slab turbulence component. In this approach large-scale perpendicular cosmic-ray transport is due to cosmic rays microscopically diffusing along the meandering magnetic field dominated by 2D turbulence because of gyroresonant interactions with slab turbulence. However, turbulence in the solar wind is intermittent and it has been suggested that intermittent turbulence might be responsible for the observation of 'dropout' events in solar energetic particle fluxes on small scales.more » In a previous paper le Roux et al. suggested, using NLGC theory as a basis, that if gyro-scale slab turbulence is intermittent, large-scale perpendicular cosmic-ray transport in weak uniform 2D turbulence will be superdiffusive or subdiffusive depending on the statistical characteristics of the intermittent slab turbulence. In this paper we expand and refine our previous work further by investigating how both parallel and perpendicular transport are affected by intermittent slab turbulence for weak as well as strong uniform 2D turbulence. The main new finding is that both parallel and perpendicular transport are the net effect of an interplay between diffusive and nondiffusive (superdiffusive or subdiffusive) transport effects as a consequence of this intermittency.« less
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH
NASA Astrophysics Data System (ADS)
Lee, D.; Gopal, S.; Mohapatra, P.
2012-07-01
We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds
NASA Astrophysics Data System (ADS)
Li, Rui; Chen, Lei; Li, Wen-Syan
Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
Tests of the Weak Equivalence Principal Below Fifty Microns
NASA Astrophysics Data System (ADS)
Leopardi, Holly; Hoyle, C. D.; Smith, Dave; Cardenas, Crystal; Harter, Andrew Conrad
2014-03-01
Due to the incompatibility of the Standard Model and General Relativity, tests of gravity remain at the forefront of experimental physics research. The Weak Equivalence Principle (WEP), which states that in a uniform gravitational field all objects fall with the same acceleration regardless of composition, total mass, or structure, is fundamentally the result of the equality of inertial mass and gravitational mass. The WEP has been effectively studied since the time of Galileo, and is a central feature of General Relativity; its violation at any length scale would bring into question fundamental aspects of the current model of gravitational physics. A variety of scenarios predict possible mechanisms that could result in a violation of the WEP. The Humboldt State University Gravitational Physics Laboratory is using a torsion pendulum with equal masses of different materials (a ``composition dipole'' configuration) to determine whether the WEP holds below the 50-micron distance scale. The experiment will measure the twist of a torsion pendulum as an attractor mass is oscillated nearby in a parallel-plate configuration, providing a time varying torque on the pendulum. The size and distance dependence of the torque variation will provide means to determine deviations from accepted models of gravity on untested distance scales. P.I.
Spreadsheet Calculation of Jets in Crossflow: Opposed Rows of Slots Slanted at 45 Degrees
NASA Technical Reports Server (NTRS)
Holderman, James D.; Clisset, James R.; Moder, Jeffrey P.
2011-01-01
The purpose of this study was to extend a baseline empirical model to the case of jets entering the mainstream flow from opposed rows of 45 degrees slanted slots. The results in this report were obtained using a spreadsheet modified from the one posted with NASA/TM--2010-216100. The primary conclusion in this report is that the best mixing configuration for opposed rows of 45 degrees slanted slots at any down stream distance is a parallel staggered configuration where the slots are angled in the same direction on top and bottom walls and one side is shifted by half the orifice spacing. Although distributions from perpendicular slanted slots are similar to those from parallel staggered configurations at some downstream locations, results for perpendicular slots are highly dependent on downstream distance and are no better than parallel staggered slots at locations where they are similar and are worse than parallel ones at other distances.
Eigensolver for a Sparse, Large Hermitian Matrix
NASA Technical Reports Server (NTRS)
Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris
2003-01-01
A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation
NASA Astrophysics Data System (ADS)
Wu, Baodong; Li, Shigang; Zhang, Yunquan; Nie, Ningming
2017-02-01
The parallel Kinetic Monte Carlo (KMC) algorithm based on domain decomposition has been widely used in large-scale physical simulations. However, the communication overhead of the parallel KMC algorithm is critical, and severely degrades the overall performance and scalability. In this paper, we present a hybrid optimization strategy to reduce the communication overhead for the parallel KMC simulations. We first propose a communication aggregation algorithm to reduce the total number of messages and eliminate the communication redundancy. Then, we utilize the shared memory to reduce the memory copy overhead of the intra-node communication. Finally, we optimize the communication scheduling using the neighborhood collective operations. We demonstrate the scalability and high performance of our hybrid optimization strategy by both theoretical and experimental analysis. Results show that the optimized KMC algorithm exhibits better performance and scalability than the well-known open-source library-SPPARKS. On 32-node Xeon E5-2680 cluster (total 640 cores), the optimized algorithm reduces the communication time by 24.8% compared with SPPARKS.
The trispectrum in the Effective Field Theory of Large Scale Structure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertolini, Daniele; Schutz, Katelin; Solon, Mikhail P.
2016-06-01
We compute the connected four point correlation function (the trispectrum in Fourier space) of cosmological density perturbations at one-loop order in Standard Perturbation Theory (SPT) and the Effective Field Theory of Large Scale Structure (EFT of LSS). This paper is a companion to our earlier work on the non-Gaussian covariance of the matter power spectrum, which corresponds to a particular wavenumber configuration of the trispectrum. In the present calculation, we highlight and clarify some of the subtle aspects of the EFT framework that arise at third order in perturbation theory for general wavenumber configurations of the trispectrum. We consistently incorporatemore » vorticity and non-locality in time into the EFT counterterms and lay out a complete basis of building blocks for the stress tensor. We show predictions for the one-loop SPT trispectrum and the EFT contributions, focusing on configurations which have particular relevance for using LSS to constrain primordial non-Gaussianity.« less
Long-time dynamics through parallel trajectory splicing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perez, Danny; Cubuk, Ekin D.; Waterland, Amos
2015-11-24
Simulating the atomistic evolution of materials over long time scales is a longstanding challenge, especially for complex systems where the distribution of barrier heights is very heterogeneous. Such systems are difficult to investigate using conventional long-time scale techniques, and the fact that they tend to remain trapped in small regions of configuration space for extended periods of time strongly limits the physical insights gained from short simulations. We introduce a novel simulation technique, Parallel Trajectory Splicing (ParSplice), that aims at addressing this problem through the timewise parallelization of long trajectories. The computational efficiency of ParSplice stems from a speculation strategymore » whereby predictions of the future evolution of the system are leveraged to increase the amount of work that can be concurrently performed at any one time, hence improving the scalability of the method. ParSplice is also able to accurately account for, and potentially reuse, a substantial fraction of the computational work invested in the simulation. We validate the method on a simple Ag surface system and demonstrate substantial increases in efficiency compared to previous methods. As a result, we then demonstrate the power of ParSplice through the study of topology changes in Ag 42Cu 13 core–shell nanoparticles.« less
Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,
1985-10-07
ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
Spin-valve Josephson junctions for cryogenic memory
NASA Astrophysics Data System (ADS)
Niedzielski, Bethany M.; Bertus, T. J.; Glick, Joseph A.; Loloee, R.; Pratt, W. P.; Birge, Norman O.
2018-01-01
Josephson junctions containing two ferromagnetic layers are being considered for use in cryogenic memory. Our group recently demonstrated that the ground-state phase difference across such a junction with carefully chosen layer thicknesses could be controllably toggled between zero and π by switching the relative magnetization directions of the two layers between the antiparallel and parallel configurations. However, several technological issues must be addressed before those junctions can be used in a large-scale memory. Many of these issues can be more easily studied in single junctions, rather than in the superconducting quantum interference device (SQUID) used for phase-sensitive measurements. In this work, we report a comprehensive study of spin-valve junctions containing a Ni layer with a fixed thickness of 2.0 nm and a NiFe layer of thickness varying between 1.1 and 1.8 nm in steps of 0.1 nm. We extract the field shift of the Fraunhofer patterns and the critical currents of the junctions in the parallel and antiparallel magnetic states, as well as the switching fields of both magnetic layers. We also report a partial study of similar junctions containing a slightly thinner Ni layer of 1.6 nm and the same range of NiFe thicknesses. These results represent the first step toward mapping out a "phase diagram" for phase-controllable spin-valve Josephson junctions as a function of the two magnetic layer thicknesses.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
NASA Technical Reports Server (NTRS)
Corke, T. C.; Guezennec, Y.; Nagib, H. M.
1981-01-01
The effects of placing a parallel-plate turbulence manipulator in a boundary layer are documented through flow visualization and hot wire measurements. The boundary layer manipulator was designed to manage the large scale structures of turbulence leading to a reduction in surface drag. The differences in the turbulent structure of the boundary layer are summarized to demonstrate differences in various flow properties. The manipulator inhibited the intermittent large scale structure of the turbulent boundary layer for at least 70 boundary layer thicknesses downstream. With the removal of the large scale, the streamwise turbulence intensity levels near the wall were reduced. The downstream distribution of the skin friction was also altered by the introduction of the manipulator.
Polymer Dispersed Liquid Crystal Displays
NASA Astrophysics Data System (ADS)
Doane, J. William
The following sections are included: * INTRODUCTION AND HISTORICAL DEVELOPMENT * PDLC MATERIALS PREPARATION * Polymerization induced phase separation (PIPS) * Thermally induced phase separation (TIPS) * Solvent induced phase separation (SIPS) * Encapsulation (NCAP) * RESPONSE VOLTAGE * Dielectric and resistive effects * Radial configuration * Bipolar configuration * Other director configurations * RESPONSE TIME * DISPLAY CONTRAST * Light scattering and index matching * Incorporation of dyes * Contrast measurements * PDLC DISPLAY DEVICES AND INNOVATIONS * Reflective direct view displays * Large-scale, flexible displays * Switchable windows * Projection displays * High definition spatial light modulator * Haze-free PDLC shutters: wide angle view displays * ENVIRONMENTAL STABILITY * ACKNOWLEDGEMENTS * REFERENCES
He, Hui; Fan, Guotao; Ye, Jianwei; Zhang, Weizhe
2013-01-01
It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system's emergency response capabilities, alleviate the cyber attacks' damage, and strengthen the system's counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system's plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks' topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.
Robotically Assembled Aerospace Structures: Digital Material Assembly using a Gantry-Type Assembler
NASA Technical Reports Server (NTRS)
Trinh, Greenfield; Copplestone, Grace; O'Connor, Molly; Hu, Steven; Nowak, Sebastian; Cheung, Kenneth; Jenett, Benjamin; Cellucci, Daniel
2017-01-01
This paper evaluates the development of automated assembly techniques for discrete lattice structures using a multi-axis gantry type CNC machine. These lattices are made of discrete components called digital materials. We present the development of a specialized end effector that works in conjunction with the CNC machine to assemble these lattices. With this configuration we are able to place voxels at a rate of 1.5 per minute. The scalability of digital material structures due to the incremental modular assembly is one of its key traits and an important metric of interest. We investigate the build times of a 5x5 beam structure on the scale of 1 meter (325 parts), 10 meters (3,250 parts), and 30 meters (9,750 parts). Utilizing the current configuration with a single end effector, performing serial assembly with a globally fixed feed station at the edge of the build volume, the build time increases according to a scaling law of n4, where n is the build scale. Build times can be reduced significantly by integrating feed systems into the gantry itself, resulting in a scaling law of n3. A completely serial assembly process will encounter time limitations as build scale increases. Automated assembly for digital materials can assemble high performance structures from discrete parts, and techniques such as built in feed systems, parallelization, and optimization of the fastening process will yield much higher throughput.
Robotically Assembled Aerospace Structures: Digital Material Assembly using a Gantry-Type Assembler
NASA Technical Reports Server (NTRS)
Trinh, Greenfield; Copplestone, Grace; O'Connor, Molly; Hu, Steven; Nowak, Sebastian; Cheung, Kenneth; Jenett, Benjamin; Cellucci, Daniel
2017-01-01
This paper evaluates the development of automated assembly techniques for discrete lattice structures using a multi-axis gantry type CNC machine. These lattices are made of discrete components called "digital materials." We present the development of a specialized end effector that works in conjunction with the CNC machine to assemble these lattices. With this configuration we are able to place voxels at a rate of 1.5 per minute. The scalability of digital material structures due to the incremental modular assembly is one of its key traits and an important metric of interest. We investigate the build times of a 5x5 beam structure on the scale of 1 meter (325 parts), 10 meters (3,250 parts), and 30 meters (9,750 parts). Utilizing the current configuration with a single end effector, performing serial assembly with a globally fixed feed station at the edge of the build volume, the build time increases according to a scaling law of n4, where n is the build scale. Build times can be reduced significantly by integrating feed systems into the gantry itself, resulting in a scaling law of n3. A completely serial assembly process will encounter time limitations as build scale increases. Automated assembly for digital materials can assemble high performance structures from discrete parts, and techniques such as built in feed systems, parallelization, and optimization of the fastening process will yield much higher throughput.
Cross-scale transport processes in the three-dimensional Kelvin-Helmholtz instability
NASA Astrophysics Data System (ADS)
Delamere, P. A.; Burkholder, B. L.; Ma, X.; Nykyri, K.
2017-12-01
The Kelvin-Helmholtz (KH) instability is a crucial aspect of the solar wind interaction with the giant magnetospheres. Rapid internal rotation of the magnetodisc produces conditions favorable for the growth of KH vortices along much of the equatorial magnetopause boundary. Pronounced dawn/dusk asymmetries at Jupiter and Saturn indicate a robust interaction with the solar wind. Using three-dimensional hybrid simulations we investigate the transport processes associated with the flow shear-driven KH instability. Of particular importance is small-scale and intermittent reconnection generated by the twisting of the magnetic field into configurations with antiparallel components. In three-dimensions strong guide field reconnection can occur even for initially parallel magnetic field configurations. Often the twisting motion leads to pairs of reconnection sites that can operate asynchronously, generating intermittent open flux and Maxwell stresses at the magnetopause boundary. We quantify the generation of open flux using field line tracing methods, determine the Reynolds and Maxwell stresses, and evaluate the mass transport as functions of magnetic shear, velocity shear, electron pressure and plasma beta. These results are compared with magnetohydrodynamic simulations (Ma et al., 2017). In addition, we present preliminary results for the role of cross-scale coupling processes, from fluid to ion scales. In particular, we characterize small-scale waves and the their role in mixing, diffusing and heating plasma at the magnetopause boundary.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Load Balancing Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pearce, Olga Tkachyshyn
2014-12-01
The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
Large-scale virtual screening on public cloud resources with Apache Spark.
Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola
2017-01-01
Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
Large-scale configuration interaction description of the structure of nuclei around 100Sn and 208Pb
NASA Astrophysics Data System (ADS)
Qi, Chong
2016-08-01
In this contribution I would like to discuss briefly the recent developments of the nuclear configuration interaction shell model approach. As examples, we apply the model to calculate the structure and decay properties of low-lying states in neutron-deficient nuclei around 100Sn and 208Pb that are of great experimental and theoretical interests.
A gyrofluid description of Alfvenic turbulence and its parallel electric field
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bian, N. H.; Kontar, E. P.
2010-06-15
Anisotropic Alfvenic fluctuations with k{sub ||}/k{sub perpendicular}<<1 remain at frequencies much smaller than the ion cyclotron frequency in the presence of a strong background magnetic field. Based on the simplest truncation of the electromagnetic gyrofluid equations in a homogeneous plasma, a model for the energy cascade produced by Alfvenic turbulence is constructed, which smoothly connects the large magnetohydrodynamics scales and the small 'kinetic' scales. Scaling relations are obtained for the electromagnetic fluctuations, as a function of k{sub perpendicular} and k{sub ||}. Moreover, a particular attention is paid to the spectral structure of the parallel electric field which is produced bymore » Alfvenic turbulence. The reason is the potential implication of this parallel electric field in turbulent acceleration and transport of particles. For electromagnetic turbulence, this issue was raised some time ago in Hasegawa and Mima [J. Geophys. Res. 83, 1117 (1978)].« less
Hot-spot investigations of utility scale panel configurations
NASA Technical Reports Server (NTRS)
Arnett, J. C.; Dally, R. B.; Rumburg, J. P.
1984-01-01
The causes of array faults and efforts to mitigate their effects are examined. Research is concentrated on the panel for the 900 kw second phase of the Sacramento Municipal Utility District (SMUD) project. The panel is designed for hot spot tolerance without comprising efficiency under normal operating conditions. Series/paralleling internal to each module improves tolerance in the power quadrant to cell short or open circuits. Analtyical methods are developed for predicting worst case shade patterns and calculating the resultant cell temperature. Experiments conducted on a prototype panel support the analytical calculations.
DOT National Transportation Integrated Search
2013-01-01
The simulator was once a very expensive, large-scale mechanical device for training military pilots or astronauts. Modern computers, linking sophisticated software and large-screen displays, have yielded simulators for the desktop or configured as sm...
Test of the CLAS12 RICH large-scale prototype in the direct proximity focusing configuration
Anefalos Pereira, S.; Baltzell, N.; Barion, L.; ...
2016-02-11
A large area ring-imaging Cherenkov detector has been designed to provide clean hadron identification capability in the momentum range from 3 GeV/c up to 8 GeV/c for the CLAS12 experiments at the upgraded 12 GeV continuous electron beam accelerator facility of Jefferson Laboratory. The adopted solution foresees a novel hybrid optics design based on aerogel radiator, composite mirrors and high-packed and high-segmented photon detectors. Cherenkov light will either be imaged directly (forward tracks) or after two mirror reflections (large angle tracks). We report here the results of the tests of a large scale prototype of the RICH detector performed withmore » the hadron beam of the CERN T9 experimental hall for the direct detection configuration. As a result, the tests demonstrated that the proposed design provides the required pion-to-kaon rejection factor of 1:500 in the whole momentum range.« less
Parallel block schemes for large scale least squares computations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Golub, G.H.; Plemmons, R.J.; Sameh, A.
1986-04-01
Large scale least squares computations arise in a variety of scientific and engineering problems, including geodetic adjustments and surveys, medical image analysis, molecular structures, partial differential equations and substructuring methods in structural engineering. In each of these problems, matrices often arise which possess a block structure which reflects the local connection nature of the underlying physical problem. For example, such super-large nonlinear least squares computations arise in geodesy. Here the coordinates of positions are calculated by iteratively solving overdetermined systems of nonlinear equations by the Gauss-Newton method. The US National Geodetic Survey will complete this year (1986) the readjustment ofmore » the North American Datum, a problem which involves over 540 thousand unknowns and over 6.5 million observations (equations). The observation matrix for these least squares computations has a block angular form with 161 diagnonal blocks, each containing 3 to 4 thousand unknowns. In this paper parallel schemes are suggested for the orthogonal factorization of matrices in block angular form and for the associated backsubstitution phase of the least squares computations. In addition, a parallel scheme for the calculation of certain elements of the covariance matrix for such problems is described. It is shown that these algorithms are ideally suited for multiprocessors with three levels of parallelism such as the Cedar system at the University of Illinois. 20 refs., 7 figs.« less
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Slepoy, A; Peters, M D; Thompson, A P
2007-11-30
Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.
EON: software for long time simulations of atomic scale systems
NASA Astrophysics Data System (ADS)
Chill, Samuel T.; Welborn, Matthew; Terrell, Rye; Zhang, Liang; Berthet, Jean-Claude; Pedersen, Andreas; Jónsson, Hannes; Henkelman, Graeme
2014-07-01
The EON software is designed for simulations of the state-to-state evolution of atomic scale systems over timescales greatly exceeding that of direct classical dynamics. States are defined as collections of atomic configurations from which a minimization of the potential energy gives the same inherent structure. The time evolution is assumed to be governed by rare events, where transitions between states are uncorrelated and infrequent compared with the timescale of atomic vibrations. Several methods for calculating the state-to-state evolution have been implemented in EON, including parallel replica dynamics, hyperdynamics and adaptive kinetic Monte Carlo. Global optimization methods, including simulated annealing, basin hopping and minima hopping are also implemented. The software has a client/server architecture where the computationally intensive evaluations of the interatomic interactions are calculated on the client-side and the state-to-state evolution is managed by the server. The client supports optimization for different computer architectures to maximize computational efficiency. The server is written in Python so that developers have access to the high-level functionality without delving into the computationally intensive components. Communication between the server and clients is abstracted so that calculations can be deployed on a single machine, clusters using a queuing system, large parallel computers using a message passing interface, or within a distributed computing environment. A generic interface to the evaluation of the interatomic interactions is defined so that empirical potentials, such as in LAMMPS, and density functional theory as implemented in VASP and GPAW can be used interchangeably. Examples are given to demonstrate the range of systems that can be modeled, including surface diffusion and island ripening of adsorbed atoms on metal surfaces, molecular diffusion on the surface of ice and global structural optimization of nanoparticles.
NASA Astrophysics Data System (ADS)
Devi, Sushila; Brogi, B. B.; Ahluwalia, P. K.; Chand, S.
2018-06-01
Electronic transport through asymmetric parallel coupled quantum dot system hybridized between normal leads has been investigated theoretically in the Coulomb blockade regime by using Non-Equilibrium Green Function formalism. A new decoupling scheme proposed by Rabani and his co-workers has been adopted to close the chain of higher order Green's functions appearing in the equations of motion. For resonant tunneling case; the calculations of current and differential conductance have been presented during transition of coupled quantum dot system from series to symmetric parallel configuration. It has been found that during this transition, increase in current and differential conductance of the system occurs. Furthermore, clear signatures of negative differential conductance and negative current appear in series case, both of which disappear when topology of system is tuned to asymmetric parallel configuration.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; ...
2018-03-07
In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.
In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
NASA Astrophysics Data System (ADS)
Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping
2017-11-01
In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
GLAD: a system for developing and deploying large-scale bioinformatics grid.
Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong
2005-03-01
Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.
Comparing multi-module connections in membrane chromatography scale-up.
Yu, Zhou; Karkaria, Tishtar; Espina, Marianela; Hunjun, Manjeet; Surendran, Abera; Luu, Tina; Telychko, Julia; Yang, Yan-Ping
2015-07-20
Membrane chromatography is increasingly used for protein purification in the biopharmaceutical industry. Membrane adsorbers are often pre-assembled by manufacturers as ready-to-use modules. In large-scale protein manufacturing settings, the use of multiple membrane modules for a single batch is often required due to the large quantity of feed material. The question as to how multiple modules can be connected to achieve optimum separation and productivity has been previously approached using model proteins and mass transport theories. In this study, we compare the performance of multiple membrane modules in series and in parallel in the production of a protein antigen. Series connection was shown to provide superior separation compared to parallel connection in the context of competitive adsorption. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Maxson, C. W.; Vaiana, G. S.
1977-01-01
In connection with high-quality solar soft X-ray images the 'quiet' features of the inner corona have been separated into two sharply different components, including the strongly reduced emission areas or coronal holes (CH) and the extended regions of looplike emission features or large-scale structures (LSS). Particular central meridian passage observations of the prominent CH1 on August 21, 1973, are selected for a quantitative study. Histogram photographic density distributions for full-disk images at other central meridian passages of CH 1 are also presented, and the techniques of converting low photographic density data to deposited energy are discussed, with particular emphasis on the problems associated with the CH data.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
The Impact of Continental Configuration on Global Response to Large Igneous Province Eruptions
NASA Astrophysics Data System (ADS)
Stellmann, J.; West, A. J.; Ridgwell, A.; Becker, T. W.
2017-12-01
The impact of Large Igneous Province eruptions as recorded in the geologic record varies widely; some eruptions cause global warming, large scale ocean acidification and anoxia and mass extinctions while others cause some or none of these phenomena. There are several potential factors which may determine the global response to a Large Igneous Province eruption; here we consider continental configuration. The arrangement of continents controls the extent of shallow seas, ocean circulation and planetary albedo; all factors which impact global climate and its response to sudden changes in greenhouse gas concentrations. To assess the potential impact of continental configuration, a suite of simulated eruptions was carried out using the cGENIE Earth system model in two end-member continental configurations: the end-Permian supercontinent and the modern. Eruptions simulated are comparable to an individual pulse of a Large Igneous Province eruption with total CO2 emissions of 1,000 or 10,000 GtC erupted over 1,000 or 10,000 years, spanning eruptions rates of .1-10 GtC/yr. Global response is characterized by measuring the magnitude and duration of changes to atmospheric concentration of CO2, saturation state of calcite and ocean oxygen levels. Preliminary model results show that end-Permian continental configuration and conditions (radiative balance, ocean chemistry) lead to smaller magnitude and shorter duration changes in atmospheric pCO2 and ocean saturation state of calcite following the simulated eruption than the modern configuration.
Schinka, J A
1995-02-01
Individual scale characteristics and the inventory structure of the Personality Assessment Inventory (PAI; Morey, 1991) were examined by conducting internal consistency and factor analyses of item and scale score data from a large group (N = 301) of alcohol-dependent patients. Alpha coefficients, mean inter-item correlations, and corrected item-total scale correlations for the sample paralleled values reported by Morey for a large clinical sample. Minor differences in the scale factor structure of the inventory from Morey's clinical sample were found. Overall, the findings support the use of the PAI in the assessment of personality and psychopathology of alcohol-dependent patients.
Predicted Sensitivity for Tests of Short-range Gravity with a Novel Parallel-plate Torsion Pendulum
NASA Astrophysics Data System (ADS)
Richards, Matthew; Baxley, Brandon; Hoyle, C. D.; Leopardi, Holly; Shook, David
2011-11-01
The parallel-plate torsion pendulum apparatus at Humboldt State University is designed to test the Weak Equivalence Principle (WEP) and the gravitational inverse-square law (ISL) of General Relativity at unprecedented levels in the sub-millimeter regime. Some versions of String Theory predict additional dimensions that might affect the gravitational inverse-square law (ISL) at sub-millimeter levels. Some models also predict the existence of unobserved subatomic particles, which if exist, could cause a violation in the WEP at short distances. Short-range tests of gravity and the WEP are also instrumental in investigating possible proposed mechanisms that attempt to explain the accelerated expansion of the universe, generally attributed to Dark Energy. The weakness of the gravitational force makes measurement very difficult at small scales. Testing such a minimal force requires highly isolated experimental systems and precise measurement and control instrumentation. Moreover, a dedicated test of the WEP has not been performed below the millimeter scale. This talk will discuss the improved sensitivity that we expect to achieve in short-range gravity tests with respect to previous efforts that employ different experimental configurations.
Experimental Analysis of File Transfer Rates over Wide-Area Dedicated Connections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata
2016-12-01
File transfers over dedicated connections, supported by large parallel file systems, have become increasingly important in high-performance computing and big data workflows. It remains a challenge to achieve peak rates for such transfers due to the complexities of file I/O, host, and network transport subsystems, and equally importantly, their interactions. We present extensive measurements of disk-to-disk file transfers using Lustre and XFS file systems mounted on multi-core servers over a suite of 10 Gbps emulated connections with 0-366 ms round trip times. Our results indicate that large buffer sizes and many parallel flows do not always guarantee high transfer rates.more » Furthermore, large variations in the measured rates necessitate repeated measurements to ensure confidence in inferences based on them. We propose a new method to efficiently identify the optimal joint file I/O and network transport parameters using a small number of measurements. We show that for XFS and Lustre with direct I/O, this method identifies configurations achieving 97% of the peak transfer rate while probing only 12% of the parameter space.« less
Yang, C L; Wei, H Y; Adler, A; Soleimani, M
2013-06-01
Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
Scaling of Magnetic Reconnection in Relativistic Collisionless Pair Plasmas
NASA Technical Reports Server (NTRS)
Liu, Yi-Hsin; Guo, Fan; Daughton, William; Li, Hui; Hesse, Michael
2015-01-01
Using fully kinetic simulations, we study the scaling of the inflow speed of collisionless magnetic reconnection in electron-positron plasmas from the non-relativistic to ultra-relativistic limit. In the anti-parallel configuration, the inflow speed increases with the upstream magnetization parameter sigma and approaches the speed of light when sigma is greater than O(100), leading to an enhanced reconnection rate. In all regimes, the divergence of the pressure tensor is the dominant term responsible for breaking the frozen-in condition at the x-line. The observed scaling agrees well with a simple model that accounts for the Lorentz contraction of the plasma passing through the diffusion region. The results demonstrate that the aspect ratio of the diffusion region, modified by the compression factor of proper density, remains approximately 0.1 in both the non-relativistic and relativistic limits.
A Green's function method for local and non-local parallel transport in general magnetic fields
NASA Astrophysics Data System (ADS)
Del-Castillo-Negrete, Diego; Chacón, Luis
2009-11-01
The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), χ, and the perpendicular, χ, conductivities (χ/χ may exceed 10^10 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields. The numerical implementation employs a volume-preserving field-line integrator [Finn and Chac'on, Phys. Plasmas, 12 (2005)] for an accurate representation of the magnetic field lines regardless of the level of stochasticity. The general formalism and its algorithmic properties are discussed along with illustrative analytical and numerical examples. Problems of particular interest include: the departures from the Rochester--Rosenbluth diffusive scaling in the weak magnetic chaos regime, the interplay between non-locality and chaos, and the robustness of transport barriers in reverse shear configurations.
Terminal Area Simulation System User's Guide - Version 10.0
NASA Technical Reports Server (NTRS)
Switzer, George F.; Proctor, Fred H.
2014-01-01
The Terminal Area Simulation System (TASS) is a three-dimensional, time-dependent, large eddy simulation model that has been developed for studies of wake vortex and weather hazards to aviation, along with other atmospheric turbulence, and cloud-scale weather phenomenology. This document describes the source code for TASS version 10.0 and provides users with needed documentation to run the model. The source code is programed in Fortran language and is formulated to take advantage of vector and efficient multi-processor scaling for execution on massively-parallel supercomputer clusters. The code contains different initialization modules allowing the study of aircraft wake vortex interaction with the atmosphere and ground, atmospheric turbulence, atmospheric boundary layers, precipitating convective clouds, hail storms, gust fronts, microburst windshear, supercell and mesoscale convective systems, tornadic storms, and ring vortices. The model is able to operate in either two- or three-dimensions with equations numerically formulated on a Cartesian grid. The primary output from the TASS is time-dependent domain fields generated by the prognostic equations and diagnosed variables. This document will enable a user to understand the general logic of TASS, and will show how to configure and initialize the model domain. Also described are the formats of the input and output files, as well as the parameters that control the input and output.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzi, Silvio; Hereld, Mark; Insley, Joseph
In this work we perform in-situ visualization of molecular dynamics simulations, which can help scientists to visualize simulation output on-the-fly, without incurring storage overheads. We present a case study to couple LAMMPS, the large-scale molecular dynamics simulation code with vl3, our parallel framework for large-scale visualization and analysis. Our motivation is to identify effective approaches for covisualization and exploration of large-scale atomistic simulations at interactive frame rates.We propose a system of coupled libraries and describe its architecture, with an implementation that runs on GPU-based clusters. We present the results of strong and weak scalability experiments, as well as future researchmore » avenues based on our results.« less
The Southwest Configuration for the Next Generation Very Large Array
NASA Astrophysics Data System (ADS)
Irwin Kellermann, Kenneth; Carilli, Chris; Condon, James; Cotton, William; Murphy, Eric Joseph; Nyland, Kristina
2018-01-01
We discuss the planned array configuration for the Next Generation Very Large Array (ngVLA). The configuration, termed the "Southwest Array," consists of 214 antennas each 18 m in diameter, distributed over the Southwest United States and Northern Mexico. The antenna locations have been set applying rough real-world constraints, such as road, fiber, and power access. The antenna locations will be fixed, with roughly 50% of the antennas in a "core" of 2 km diameter, located at the site of the JVLA. Another 30% of the antennas will be distributed over the Plains of San Augustin to a diameter of 30 km, possibly along, or near, the current JVLA arms. The remaining 20% of the antennas will be distributed in a rough two-arm spiral pattern to the South and East, out to a maximum distance of 500 km, into Texas, Arizona, and Chihuahua. Years of experience with the VLA up to 50 GHz, plus intensive antenna testing up to 250 GHz for the ALMA prototype antennas, verify the VLA site as having very good observing conditions (opacity, phase stability), up to 115 GHz (ngVLA Memo No. 1). Using a suite of tools implemented in CASA, we have made extensive imaging simulations with this configuration. We find that good imaging performance can be obtained through appropriate weighting of the visibilities, for resolutions ranging from that of the core of the array (1" at 30 GHz), out to the longest baselines (10 mas at 30 GHz), with a loss of roughly a factor of two in sensitivity relative to natural weighting (ngVLA Memo No. 16). The off-set core, located on the northern edge of the long baseline configuration, provides excellent sensitivity even on the longest baselines. We are considering, in addition, a compact configuration of 16 close-packed 6 m antennas to obtain uv-coverage down to baselines ~ 10 m for imaging large scale structure, as well as a configuration including 9 stations distributed to continental scales.
Wrapping up BLAST and other applications for use on Unix clusters.
Hokamp, Karsten; Shields, Denis C; Wolfe, Kenneth H; Caffrey, Daniel R
2003-02-12
We have developed two programs that speed up common bioinformatic applications by spreading them across a UNIX cluster.(1) BLAST.pm, a new module for the 'MOLLUSC' package. (2) WRAPID, a simple tool for parallelizing large numbers of small instances of programs such as BLAST, FASTA and CLUSTALW. The packages were developed in Perl on a 20-node Linux cluster and are provided together with a configuration script and documentation. They can be freely downloaded from http://wolfe.gen.tcd.ie/wrapper.
NASA Astrophysics Data System (ADS)
Liao, Mingle; Wu, Baojian; Hou, Jianhong; Qiu, Kun
2018-03-01
Large scale optical switches are essential components in optical communication network. We aim to build up a large scale optical switch matrix by the interconnection of silicon-based optical switch chips using 3-stage CLOS structure, where EDFAs are needed to compensate for the insertion loss of the chips. The optical signal-to-noise ratio (OSNR) performance of the resulting large scale optical switch matrix is investigated for TE-mode light and the experimental results are in agreement with the theoretical analysis. We build up a 64 ×64 switch matrix by use of 16 ×16 optical switch chips and the OSNR and receiver sensibility can respectively be improved by 0.6 dB and 0.2 dB by optimizing the gain configuration of the EDFAs.
Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...
2013-07-18
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.
Connecting spatial and temporal scales of tropical precipitation in observations and the MetUM-GA6
NASA Astrophysics Data System (ADS)
Martin, Gill M.; Klingaman, Nicholas P.; Moise, Aurel F.
2017-01-01
This study analyses tropical rainfall variability (on a range of temporal and spatial scales) in a set of parallel Met Office Unified Model (MetUM) simulations at a range of horizontal resolutions, which are compared with two satellite-derived rainfall datasets. We focus on the shorter scales, i.e. from the native grid and time step of the model through sub-daily to seasonal, since previous studies have paid relatively little attention to sub-daily rainfall variability and how this feeds through to longer scales. We find that the behaviour of the deep convection parametrization in this model on the native grid and time step is largely independent of the grid-box size and time step length over which it operates. There is also little difference in the rainfall variability on larger/longer spatial/temporal scales. Tropical convection in the model on the native grid/time step is spatially and temporally intermittent, producing very large rainfall amounts interspersed with grid boxes/time steps of little or no rain. In contrast, switching off the deep convection parametrization, albeit at an unrealistic resolution for resolving tropical convection, results in very persistent (for limited periods), but very sporadic, rainfall. In both cases, spatial and temporal averaging smoothes out this intermittency. On the ˜ 100 km scale, for oceanic regions, the spectra of 3-hourly and daily mean rainfall in the configurations with parametrized convection agree fairly well with those from satellite-derived rainfall estimates, while at ˜ 10-day timescales the averages are overestimated, indicating a lack of intra-seasonal variability. Over tropical land the results are more varied, but the model often underestimates the daily mean rainfall (partly as a result of a poor diurnal cycle) but still lacks variability on intra-seasonal timescales. Ultimately, such work will shed light on how uncertainties in modelling small-/short-scale processes relate to uncertainty in climate change projections of rainfall distribution and variability, with a view to reducing such uncertainty through improved modelling of small-/short-scale processes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Zhenhuan; Boyuka, David; Zou, X
Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
Numerical Study of Sound Emission by 2D Regular and Chaotic Vortex Configurations
NASA Astrophysics Data System (ADS)
Knio, Omar M.; Collorec, Luc; Juvé, Daniel
1995-02-01
The far-field noise generated by a system of three Gaussian vortices lying over a flat boundary is numerically investigated using a two-dimensional vortex element method. The method is based on the discretization of the vorticity field into a finite number of smoothed vortex elements of spherical overlapping cores. The elements are convected in a Lagrangian reference along particle trajectories using the local velocity vector, given in terms of a desingularized Biot-Savart law. The initial structure of the vortex system is triangular; a one-dimensional family of initial configurations is constructed by keeping one side of the triangle fixed and vertical, and varying the abscissa of the centroid of the remaining vortex. The inviscid dynamics of this vortex configuration are first investigated using non-deformable vortices. Depending on the aspect ratio of the initial system, regular or chaotic motion occurs. Due to wall-related symmetries, the far-field sound always exhibits a time-independent quadrupolar directivity with maxima parallel end perpendicular to the wall. When regular motion prevails, the noise spectrum is dominated by discrete frequencies which correspond to the fundamental system frequency and its superharmonics. For chaotic motion, a broadband spectrum is obtained; computed soundlevels are substantially higher than in non-chaotic systems. A more sophisticated analysis is then performed which accounts for vortex core dynamics. Results show that the vortex cores are susceptible to inviscid instability which leads to violent vorticity reorganization within the core. This phenomenon has little effect on the large-scale features of the motion of the system or on low frequency sound emission. However, it leads to the generation of a high-frequency noise band in the acoustic pressure spectrum. The latter is observed in both regular and chaotic system simulations.
Omnidirectional antenna having constant phase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sena, Matthew
Various technologies presented herein relate to constructing and/or operating an antenna having an omnidirectional electrical field of constant phase. The antenna comprises an upper plate made up of multiple conductive rings, a lower ground-plane plate, a plurality of grounding posts, a conical feed, and a radio frequency (RF) feed connector. The upper plate has a multi-ring configuration comprising a large outer ring and several smaller rings of equal size located within the outer ring. The large outer ring and the four smaller rings have the same cross-section. The grounding posts ground the upper plate to the lower plate while maintainingmore » a required spacing/parallelism therebetween.« less
NASA Astrophysics Data System (ADS)
Christ, John A.; Goltz, Mark N.
2004-01-01
Pump-and-treat systems that are installed to contain contaminated groundwater migration typically involve placement of extraction wells perpendicular to the regional groundwater flow direction at the down gradient edge of a contaminant plume. These wells capture contaminated water for above ground treatment and disposal, thereby preventing further migration of contaminated water down gradient. In this work, examining two-, three-, and four-well systems, we compare well configurations that are parallel and perpendicular to the regional groundwater flow direction. We show that orienting extraction wells co-linearly, parallel to regional flow, results in (1) a larger area of aquifer influenced by the wells at a given total well flow rate, (2) a center and ultimate capture zone width equal to the perpendicular configuration, and (3) more flexibility with regard to minimizing drawdown. Although not suited for some scenarios, we found orienting extraction wells parallel to regional flow along a plume centerline, when compared to a perpendicular configuration, reduces drawdown by up to 7% and minimizes the fraction of uncontaminated water captured.
Solid State Mini-RPV Color Imaging System
1975-09-12
completed in the design and construction phase . Con- siderations are now in progress for conducting field tests of the equipment against "real world...Simplified Parallel Injection Configuration 2-21 CID Parallel Injection Configuration 2-23 Element Rate Timing 2-25 Horizontal Input and Phase Line...Timing 2-26 Line Reset /Injection Timing 2-27 Line Rate Timing (Start of Readout) 2-28 Driver A4 Block Diagram 2-31 Element Scan Time Base
Analysis techniques for diagnosing runaway ion distributions in the reversed field pinch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, J., E-mail: jkim536@wisc.edu; Anderson, J. K.; Capecchi, W.
2016-11-15
An advanced neutral particle analyzer (ANPA) on the Madison Symmetric Torus measures deuterium ions of energy ranges 8-45 keV with an energy resolution of 2-4 keV and time resolution of 10 μs. Three different experimental configurations measure distinct portions of the naturally occurring fast ion distributions: fast ions moving parallel, anti-parallel, or perpendicular to the plasma current. On a radial-facing port, fast ions moving perpendicular to the current have the necessary pitch to be measured by the ANPA. With the diagnostic positioned on a tangent line through the plasma core, a chord integration over fast ion density, background neutral density,more » and local appropriate pitch defines the measured sample. The plasma current can be reversed to measure anti-parallel fast ions in the same configuration. Comparisons of energy distributions for the three configurations show an anisotropic fast ion distribution favoring high pitch ions.« less
Measurement configuration optimization for dynamic metrology using Stokes polarimetry
NASA Astrophysics Data System (ADS)
Liu, Jiamin; Zhang, Chuanwei; Zhong, Zhicheng; Gu, Honggang; Chen, Xiuguo; Jiang, Hao; Liu, Shiyuan
2018-05-01
As dynamic loading experiments such as a shock compression test are usually characterized by short duration, unrepeatability and high costs, high temporal resolution and precise accuracy of the measurements is required. Due to high temporal resolution up to a ten-nanosecond-scale, a Stokes polarimeter with six parallel channels has been developed to capture such instantaneous changes in optical properties in this paper. Since the measurement accuracy heavily depends on the configuration of the probing beam incident angle and the polarizer azimuth angle, it is important to select an optimal combination from the numerous options. In this paper, a systematic error propagation-based measurement configuration optimization method corresponding to the Stokes polarimeter was proposed. The maximal Frobenius norm of the combinatorial matrix of the configuration error propagating matrix and the intrinsic error propagating matrix is introduced to assess the measurement accuracy. The optimal configuration for thickness measurement of a SiO2 thin film deposited on a Si substrate has been achieved by minimizing the merit function. Simulation and experimental results show a good agreement between the optimal measurement configuration achieved experimentally using the polarimeter and the theoretical prediction. In particular, the experimental result shows that the relative error in the thickness measurement can be reduced from 6% to 1% by using the optimal polarizer azimuth angle when the incident angle is 45°. Furthermore, the optimal configuration for the dynamic metrology of a nickel foil under quasi-dynamic loading is investigated using the proposed optimization method.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning
Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.
Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
NASA Technical Reports Server (NTRS)
Bain, D. B.; Smith, C. E.; Holdeman, J. D.
1992-01-01
A CFD study was performed to analyze the mixing potential of opposed rows of staggered jets injected into confined crossflow in a rectangular duct. Three jet configurations were numerically tested: (1) straight (0 deg) slots; (2) perpendicular slanted (45 deg) slots angled in opposite directions on top and bottom walls; and (3) parallel slanted (45 deg) slots angled in the same direction on top and bottom walls. All three configurations were tested at slot spacing-to-duct height ratios (S/H) of 0.5, 0.75, and 1.0; a jet-to-mainstream momentum flux ratio (J) of 100; and a jet-to-mainstream mass flow ratio of 0.383. Each configuration had its best mixing performance at S/H of 0.75. Asymmetric flow patterns were expected and predicted for all slanted slot configurations. The parallel slanted slot configuration was the best overall configuration at x/H of 1.0 for S/H of 0.75.
Program Correctness, Verification and Testing for Exascale (Corvette)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sen, Koushik; Iancu, Costin; Demmel, James W
The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Real-Time Reed-Solomon Decoder
NASA Technical Reports Server (NTRS)
Maki, Gary K.; Cameron, Kelly B.; Owsley, Patrick A.
1994-01-01
Generic Reed-Solomon decoder fast enough to correct errors in real time in practical applications designed to be implemented in fewer and smaller very-large-scale integrated, VLSI, circuit chips. Configured to operate in pipelined manner. One outstanding aspect of decoder design is that Euclid multiplier and divider modules contain Galoisfield multipliers configured as combinational-logic cells. Operates at speeds greater than older multipliers. Cellular configuration highly regular and requires little interconnection area, making it ideal for implementation in extraordinarily dense VLSI circuitry. Flight electronics single chip version of this technology implemented and available.
Kantsyrev, V L; Chuvatin, A S; Rudakov, L I; Velikovich, A L; Shrestha, I K; Esaulov, A A; Safronova, A S; Shlyaptseva, V V; Osborne, G C; Astanovitsky, A L; Weller, M E; Stafford, A; Schultz, K A; Cooper, M C; Cuneo, M E; Jones, B; Vesey, R A
2014-12-01
A compact Z-pinch x-ray hohlraum design with parallel-driven x-ray sources is experimentally demonstrated in a configuration with a central target and tailored shine shields at a 1.7-MA Zebra generator. Driving in parallel two magnetically decoupled compact double-planar-wire Z pinches has demonstrated the generation of synchronized x-ray bursts that correlated well in time with x-ray emission from a central reemission target. Good agreement between simulated and measured hohlraum radiation temperature of the central target is shown. The advantages of compact hohlraum design applications for multi-MA facilities are discussed.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2015-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Eddy diffusivity of quasi-neutrally-buoyant inertial particles
NASA Astrophysics Data System (ADS)
Martins Afonso, Marco; Muratore-Ginanneschi, Paolo; Gama, Sílvio M. A.; Mazzino, Andrea
2018-04-01
We investigate the large-scale transport properties of quasi-neutrally-buoyant inertial particles carried by incompressible zero-mean periodic or steady ergodic flows. We show how to compute large-scale indicators such as the inertial-particle terminal velocity and eddy diffusivity from first principles in a perturbative expansion around the limit of added-mass factor close to unity. Physically, this limit corresponds to the case where the mass density of the particles is constant and close in value to the mass density of the fluid, which is also constant. Our approach differs from the usual over-damped expansion inasmuch as we do not assume a separation of time scales between thermalization and small-scale convection effects. For a general flow in the class of incompressible zero-mean periodic velocity fields, we derive closed-form cell equations for the auxiliary quantities determining the terminal velocity and effective diffusivity. In the special case of parallel flows these equations admit explicit analytic solution. We use parallel flows to show that our approach sheds light onto the behavior of terminal velocity and effective diffusivity for Stokes numbers of the order of unity.
Characterizing parallel file-access patterns on a large-scale multiprocessor
NASA Technical Reports Server (NTRS)
Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.
1995-01-01
High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
The build up of the correlation between halo spin and the large-scale structure
NASA Astrophysics Data System (ADS)
Wang, Peng; Kang, Xi
2018-01-01
Both simulations and observations have confirmed that the spin of haloes/galaxies is correlated with the large-scale structure (LSS) with a mass dependence such that the spin of low-mass haloes/galaxies tend to be parallel with the LSS, while that of massive haloes/galaxies tend to be perpendicular with the LSS. It is still unclear how this mass dependence is built up over time. We use N-body simulations to trace the evolution of the halo spin-LSS correlation and find that at early times the spin of all halo progenitors is parallel with the LSS. As time goes on, mass collapsing around massive halo is more isotropic, especially the recent mass accretion along the slowest collapsing direction is significant and it brings the halo spin to be perpendicular with the LSS. Adopting the fractional anisotropy (FA) parameter to describe the degree of anisotropy of the large-scale environment, we find that the spin-LSS correlation is a strong function of the environment such that a higher FA (more anisotropic environment) leads to an aligned signal, and a lower anisotropy leads to a misaligned signal. In general, our results show that the spin-LSS correlation is a combined consequence of mass flow and halo growth within the cosmic web. Our predicted environmental dependence between spin and large-scale structure can be further tested using galaxy surveys.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chrisochoides, N.; Sukup, F.
In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)
1997-01-01
We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
Upper-surface-blowing flow-turning performance
NASA Technical Reports Server (NTRS)
Sleeman, W. C., Jr.; Phelps, A. E., III
1976-01-01
Jet exhaust flow-turning characteristics were determined for systematic variations in upper-surface blowing exhaust nozzles and trailing-edge flap configuration variables from experimental wind-off (static) flow studies. For conditions with parallel flow exhausting from the nozzle, jet height (as indicated by nozzle exit height) and flap radius were found to be the most important parameters relating to flow turning. Nonparallel flow from the nozzle, as obtained from an internal roof angle and/or side spread angle, had a large favorable effect on flow turning. Comparisons made between static turning results and wind tunnel aerodynamic studies of identical configurations indicated that static flow-turning results can be indicative of wind-on powered lift performance for both good and poor nozzle-flap combinations but, for marginal designs, can lead to overly optimistic assessment of powered lift potential.
Building up the spin - orbit alignment of interacting galaxy pairs
NASA Astrophysics Data System (ADS)
Moon, Jun-Sung; Yoon, Suk-Jin
2018-01-01
Galaxies are not just randomly distributed throughout space. Instead, they are in alignment over a wide range of scales from the cosmic web down to a pair of galaxies. Motivated by recent findings that the spin and the orbital angular momentum vectors of galaxy pairs tend to be parallel, we here investigate the spin - orbit orientation in close pairs using the Illustris cosmological simulation. We find that since z ~ 1, the parallel alignment has become progressively stronger with time through repetitive encounters. The pair Interactions are preferentially in prograde at z = 0 (over 5 sigma significance). The prograde fraction at z = 0 is larger for the pairs influenced more heavily by each other during their evolution. We find no correlation between the spin - orbit orientation and the surrounding large-scale structure. Our results favor the scenario in which the alignment in close pairs is caused by tidal interactions later on, rather than the primordial torquing by the large-scale structures.
Kennedy, Jacob J.; Abbatiello, Susan E.; Kim, Kyunggon; Yan, Ping; Whiteaker, Jeffrey R.; Lin, Chenwei; Kim, Jun Seok; Zhang, Yuzheng; Wang, Xianlong; Ivey, Richard G.; Zhao, Lei; Min, Hophil; Lee, Youngju; Yu, Myeong-Hee; Yang, Eun Gyeong; Lee, Cheolju; Wang, Pei; Rodriguez, Henry; Kim, Youngsoo; Carr, Steven A.; Paulovich, Amanda G.
2014-01-01
The successful application of MRM in biological specimens raises the exciting possibility that assays can be configured to measure all human proteins, resulting in an assay resource that would promote advances in biomedical research. We report the results of a pilot study designed to test the feasibility of a large-scale, international effort in MRM assay generation. We have configured, validated across three laboratories, and made publicly available as a resource to the community 645 novel MRM assays representing 319 proteins expressed in human breast cancer. Assays were multiplexed in groups of >150 peptides and deployed to quantify endogenous analyte in a panel of breast cancer-related cell lines. Median assay precision was 5.4%, with high inter-laboratory correlation (R2 >0.96). Peptide measurements in breast cancer cell lines were able to discriminate amongst molecular subtypes and identify genome-driven changes in the cancer proteome. These results establish the feasibility of a scaled, international effort. PMID:24317253
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Christopher H.; Long, Hai; Sides, Scott
2015-10-15
Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
PEGASUS 5: An Automated Pre-Processor for Overset-Grid CFD
NASA Technical Reports Server (NTRS)
Suhs, Norman E.; Rogers, Stuart E.; Dietz, William E.; Kwak, Dochan (Technical Monitor)
2002-01-01
An all new, automated version of the PEGASUS software has been developed and tested. PEGASUS provides the hole-cutting and connectivity information between overlapping grids, and is used as the final part of the grid generation process for overset-grid computational fluid dynamics approaches. The new PEGASUS code (Version 5) has many new features: automated hole cutting; a projection scheme for fixing gaps in overset surfaces; more efficient interpolation search methods using an alternating digital tree; hole-size optimization based on adding additional layers of fringe points; and an automatic restart capability. The new code has also been parallelized using the Message Passing Interface standard. The parallelization performance provides efficient speed-up of the execution time by an order of magnitude, and up to a factor of 30 for very large problems. The results of three example cases are presented: a three-element high-lift airfoil, a generic business jet configuration, and a complete Boeing 777-200 aircraft in a high-lift landing configuration. Comparisons of the computed flow fields for the airfoil and 777 test cases between the old and new versions of the PEGASUS codes show excellent agreement with each other and with experimental results.
Amplitudes and Anisotropies at Kinetic Scales in Reflection-Driven Turbulence
NASA Astrophysics Data System (ADS)
Chandran, B. D. G.; Perez, J. C.
2016-12-01
The dissipation processes in solar-wind turbulence depend critically on the amplitudes and anisotropies of the fluctuations at kinetic scales. For example, the efficiencies of nonlinear dissipation mechanisms such as stochastic heating are a strongly increasing function of the kinetic-scale fluctuation amplitudes. In addition, ``slab-like'' fluctuations that vary most rapidly parallel to the background magnetic field dissipate very differently than ``quasi-2D'' fluctuations that vary most rapidly perpendicular to the magnetic field. Both the amplitudes and anisotropies of the kinetic-scale fluctuations are heavily influenced by the cascade mechanisms and spectral scalings in the inertial range of the turbulence. More precisely, the properties and dynamics of the turbulence within the inertial range (at ``fluid length scales'') to a large extent determine the amplitudes and anisotropies of the fluctuations at the proton kinetic scales, which bound the inertial range from below. In this presentation I will describe recent work by Jean Perez and myself on direct numerical simulations of non-compressive turbulence at ``fluid length scales'' between the Sun and a heliocentric distance of 65 solar radii. These simulations account for the non-WKB reflection of outward-propagating Alfven-wave-like fluctuations. This partial reflection produces Sunward-propagating fluctuations, which interact with the outward-propagating fluctuations to produce turbulence and a cascade of energy from large scales to small scales. I will discuss the relative strength of the parallel and perpendicular energy cascades in our simulations, and the implications of our results for the spatial anisotropies of non-compressive fluctuations at the proton kinetic scales near the Sun. I will also present results on the parallel and perpendicular power spectra of both outward-propagating and inward-propagating Alfven-wave-like fluctuations at different heliocentric distances. I will discuss the implications of these inertial-range spectra for the relative importance of cyclotron heating, stochastic heating, and Landau damping.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Large-eddy simulations of compressible convection on massively parallel computers. [stellar physics
NASA Technical Reports Server (NTRS)
Xie, Xin; Toomre, Juri
1993-01-01
We report preliminary implementation of the large-eddy simulation (LES) technique in 2D simulations of compressible convection carried out on the CM-2 massively parallel computer. The convective flow fields in our simulations possess structures similar to those found in a number of direct simulations, with roll-like flows coherent across the entire depth of the layer that spans several density scale heights. Our detailed assessment of the effects of various subgrid scale (SGS) terms reveals that they may affect the gross character of convection. Yet, somewhat surprisingly, we find that our LES solutions, and another in which the SGS terms are turned off, only show modest differences. The resulting 2D flows realized here are rather laminar in character, and achieving substantial turbulence may require stronger forcing and less dissipation.
The structure and evolution of coronal holes
NASA Technical Reports Server (NTRS)
Timothy, A. F.; Krieger, A. S.; Vaiana, G. S.
1975-01-01
Soft X-ray observations of coronal holes are analyzed to determine the structure, temporal evolution, and rotational properties of those features as well as possible mechanisms which may account for their almost rigid rotational characteristics. It is shown that coronal holes are open features with a divergent magnetic-field configuration resulting from a particular large-scale magnetic-field topology. They are apparently formed when the successive emergence and dispersion of active-region fields produce a swath of unipolar field founded by fields of opposite polarity, and they die when large-scale field patterns emerge which significantly distort the original field configuration. Two types of holes are described (compact and elongated), and three possible rotation mechanisms are considered: a rigidly rotating subphotospheric phenomenon, a linking of high and low latitudes by closed field lines, and an interaction between moving coronal material and open field lines.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Aerodynamic Effects of Simulated Ice Accretion on a Generic Transport Model
NASA Technical Reports Server (NTRS)
Broeren, Andy P.; Lee, Sam; Shah, Gautam H.; Murphy, Patrick C.
2012-01-01
An experimental research effort was begun to develop a database of airplane aerodynamic characteristics with simulated ice accretion over a large range of incidence and sideslip angles. Wind-tunnel testing was performed at the NASA Langley 12-ft Low-Speed Wind Tunnel using a 3.5 percent scale model of the NASA Langley Generic Transport Model. Aerodynamic data were acquired from a six-component force and moment balance in static-model sweeps from alpha = -5deg to 85deg and beta = -45 deg to 45 deg at a Reynolds number of 0.24 x10(exp 6) and Mach number of 0.06. The 3.5 percent scale GTM was tested in both the clean configuration and with full-span artificial ice shapes attached to the leading edges of the wing, horizontal and vertical tail. Aerodynamic results for the clean airplane configuration compared favorably with similar experiments carried out on a 5.5 percent scale GTM. The addition of the large, glaze-horn type ice shapes did result in an increase in airplane drag coefficient but had little effect on the lift and pitching moment. The lateral-directional characteristics showed mixed results with a small effect of the ice shapes observed in some cases. The flow visualization images revealed the presence and evolution of a spanwise-running vortex on the wing that was the dominant feature of the flowfield for both clean and iced configurations. The lack of ice-induced performance and flowfield effects observed in this effort was likely due to Reynolds number effects for the clean configuration. Estimates of full-scale baseline performance were included in this analysis to illustrate the potential icing effects.
Two non linear dynamics plasma astrophysics experiments at LANL
NASA Astrophysics Data System (ADS)
Intrator, T. P.; Weber, T. E.; Feng, Y.; Sears, J. A.; Swan, H.; Hutchinson, T.; Boguski, J.; Gao, K.; Chapdelaine, L.; Dunn, J.
2013-10-01
Two laboratory experiments at Los Alamos National Laboratory (LANL) have been built to gain access to a wide range of fundamental plasma physics issues germane astro, space, and fusion plasmas. The over arching theme is magnetized plasma dynamics that include currents, MHD forces and instabilities, sheared flows and shocks, creation and annihilation of magnetic field. The Reconnection Scaling Experiment (RSX) creates current sheets and flux ropes that exhibit fully 3D dynamics, that can kink, bounce, merge and reconnect, shred, and reform in complicated ways. The most recent movies from a large detailed data set describe the 3D magnetic structure and helicity budget of a driven and dissipative system that spontaneously self saturates a kink instability. The Magnetized Shock Experiment (MSX) uses a Field reversed configuration (FRC) that is ejected at high speed and then stagnated onto a stopping mirror field, which drives a collisionless magnetized shock. A plasmoid accelerator will also access super critical shocks at much larger Alfven Mach numbers. Unique features include access to parallel, oblique and perpendicular shocks, in regions much larger than ion gyro radius and inertial length, large magnetic and fluid Reynolds numbers, and volume for turbulence. Center for Magnetic Self Organization, NASA Geospace NNHIOA044I-Basic, Department of Energy DE-AC52-06NA25369.
NASA Technical Reports Server (NTRS)
Le, G.; Wang, Y.; Slavin, J. A.; Strangeway, R. L.
2009-01-01
Space Technology 5 (ST5) is a constellation mission consisting of three microsatellites. It provides the first multipoint magnetic field measurements in low Earth orbit, which enables us to separate spatial and temporal variations. In this paper, we present a study of the temporal variability of field-aligned currents using the ST5 data. We examine the field-aligned current observations during and after a geomagnetic storm and compare the magnetic field profiles at the three spacecraft. The multipoint data demonstrate that mesoscale current structures, commonly embedded within large-scale current sheets, are very dynamic with highly variable current density and/or polarity in approx.10 min time scales. On the other hand, the data also show that the time scales for the currents to be relatively stable are approx.1 min for mesoscale currents and approx.10 min for large-scale currents. These temporal features are very likely associated with dynamic variations of their charge carriers (mainly electrons) as they respond to the variations of the parallel electric field in auroral acceleration region. The characteristic time scales for the temporal variability of mesoscale field-aligned currents are found to be consistent with those of auroral parallel electric field.
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Low-speed longitudinal and lateral-directional aerodynamic characteristics of the X-31 configuration
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Gatlin, Gregory M.; Paulson, John W., Jr.
1992-01-01
An experimental investigation of a 19 pct. scale model of the X-31 configuration was completed in the Langley 14 x 22 Foot Subsonic Tunnel. This study was performed to determine the static low speed aerodynamic characteristics of the basic configuration over a large range of angle of attack and sideslip and to study the effects of strakes, leading-edge extensions (wing-body strakes), nose booms, speed-brake deployment, and inlet configurations. The ultimate purpose was to optimize the configuration for high angle of attack and maneuvering-flight conditions. The model was tested at angles of attack from -5 to 67 deg and at sideslip angles from -16 to 16 deg for speeds up to 190 knots (dynamic pressure of 120 psf).
Detonation wave detection probe including parallel electrodes on a flexible backing strip
Uher, Kenneth J.
1995-01-01
A device for sensing the occurrence of destructive events and events involving mechanical shock in a non-intrusive manner. A pair of electrodes is disposed in a parallel configuration on a backing strip of flexible film. Electrical circuitry is used to sense the time at which an event causes electrical continuity between the electrodes or, with a sensor configuration where the electrodes are shorted together, to sense the time at which electrical continuity is lost.
Solving Navier-Stokes equations on a massively parallel processor; The 1 GFLOP performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saati, A.; Biringen, S.; Farhat, C.
This paper reports on experience in solving large-scale fluid dynamics problems on the Connection Machine model CM-2. The authors have implemented a parallel version of the MacCormack scheme for the solution of the Navier-Stokes equations. By using triad floating point operations and reducing the number of interprocessor communications, they have achieved a sustained performance rate of 1.42 GFLOPS.
NASA Astrophysics Data System (ADS)
Zhou, Pu; Wang, Xiaolin; Li, Xiao; Chen, Zilum; Xu, Xiaojun; Liu, Zejin
2009-10-01
Coherent summation of fibre laser beams, which can be scaled to a relatively large number of elements, is simulated by using the stochastic parallel gradient descent (SPGD) algorithm. The applicability of this algorithm for coherent summation is analysed and its optimisaton parameters and bandwidth limitations are studied.
Skin Friction Reduction Through Large-Scale Forcing
NASA Astrophysics Data System (ADS)
Bhatt, Shibani; Artham, Sravan; Gnanamanickam, Ebenezer
2017-11-01
Flow structures in a turbulent boundary layer larger than an integral length scale (δ), referred to as large-scales, interact with the finer scales in a non-linear manner. By targeting these large-scales and exploiting this non-linear interaction wall shear stress (WSS) reduction of over 10% has been achieved. The plane wall jet (PWJ), a boundary layer which has highly energetic large-scales that become turbulent independent of the near-wall finer scales, is the chosen model flow field. It's unique configuration allows for the independent control of the large-scales through acoustic forcing. Perturbation wavelengths from about 1 δ to 14 δ were considered with a reduction in WSS for all wavelengths considered. This reduction, over a large subset of the wavelengths, scales with both inner and outer variables indicating a mixed scaling to the underlying physics, while also showing dependence on the PWJ global properties. A triple decomposition of the velocity fields shows an increase in coherence due to forcing with a clear organization of the small scale turbulence with respect to the introduced large-scale. The maximum reduction in WSS occurs when the introduced large-scale acts in a manner so as to reduce the turbulent activity in the very near wall region. This material is based upon work supported by the Air Force Office of Scientific Research under Award Number FA9550-16-1-0194 monitored by Dr. Douglas Smith.
Nonlinear Diamagnetic Stabilization of Double Tearing Modes in Cylindrical MHD Simulations
NASA Astrophysics Data System (ADS)
Abbott, Stephen; Germaschewski, Kai
2014-10-01
Double tearing modes (DTMs) may occur in reversed-shear tokamak configurations if two nearby rational surfaces couple and begin reconnecting. During the DTM's nonlinear evolution it can enter an ``explosive'' growth phase leading to complete reconnection, making it a possible driver for off-axis sawtooth crashes. Motivated by similarities between this behavior and that of the m = 1 kink-tearing mode in conventional tokamaks we investigate diamagnetic drifts as a possible DTM stabilization mechanism. We extend our previous linear studies of an m = 2 , n = 1 DTM in cylindrical geometry to the fully nonlinear regime using the MHD code MRC-3D. A pressure gradient similar to observed ITB profiles is used, together with Hall physics, to introduce ω* effects. We find the diamagnetic drifts can have a stabilizing effect on the nonlinear DTM through a combination of large scale differential rotation and mechanisms local to the reconnection layer. MRC-3D is an extended MHD code based on the libMRC computational framework. It supports nonuniform grids in curvilinear coordinates with parallel implicit and explicit time integration.
C3: A Command-line Catalogue Cross-matching tool for modern astrophysical survey data
NASA Astrophysics Data System (ADS)
Riccio, Giuseppe; Brescia, Massimo; Cavuoti, Stefano; Mercurio, Amata; di Giorgio, Anna Maria; Molinari, Sergio
2017-06-01
In the current data-driven science era, it is needed that data analysis techniques has to quickly evolve to face with data whose dimensions has increased up to the Petabyte scale. In particular, being modern astrophysics based on multi-wavelength data organized into large catalogues, it is crucial that the astronomical catalog cross-matching methods, strongly dependant from the catalogues size, must ensure efficiency, reliability and scalability. Furthermore, multi-band data are archived and reduced in different ways, so that the resulting catalogues may differ each other in formats, resolution, data structure, etc, thus requiring the highest generality of cross-matching features. We present C 3 (Command-line Catalogue Cross-match), a multi-platform application designed to efficiently cross-match massive catalogues from modern surveys. Conceived as a stand-alone command-line process or a module within generic data reduction/analysis pipeline, it provides the maximum flexibility, in terms of portability, configuration, coordinates and cross-matching types, ensuring high performance capabilities by using a multi-core parallel processing paradigm and a sky partitioning algorithm.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; ...
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
Continental-scale patterns of canopy tree composition and function across Amazonia.
ter Steege, Hans; Pitman, Nigel C A; Phillips, Oliver L; Chave, Jerome; Sabatier, Daniel; Duque, Alvaro; Molino, Jean-François; Prévost, Marie-Françoise; Spichiger, Rodolphe; Castellanos, Hernán; von Hildebrand, Patricio; Vásquez, Rodolfo
2006-09-28
The world's greatest terrestrial stores of biodiversity and carbon are found in the forests of northern South America, where large-scale biogeographic patterns and processes have recently begun to be described. Seven of the nine countries with territory in the Amazon basin and the Guiana shield have carried out large-scale forest inventories, but such massive data sets have been little exploited by tropical plant ecologists. Although forest inventories often lack the species-level identifications favoured by tropical plant ecologists, their consistency of measurement and vast spatial coverage make them ideally suited for numerical analyses at large scales, and a valuable resource to describe the still poorly understood spatial variation of biomass, diversity, community composition and forest functioning across the South American tropics. Here we show, by using the seven forest inventories complemented with trait and inventory data collected elsewhere, two dominant gradients in tree composition and function across the Amazon, one paralleling a major gradient in soil fertility and the other paralleling a gradient in dry season length. The data set also indicates that the dominance of Fabaceae in the Guiana shield is not necessarily the result of root adaptations to poor soils (nodulation or ectomycorrhizal associations) but perhaps also the result of their remarkably high seed mass there as a potential adaptation to low rates of disturbance.
Continental-scale patterns of canopy tree composition and function across Amazonia
NASA Astrophysics Data System (ADS)
Ter Steege, Hans; Pitman, Nigel C. A.; Phillips, Oliver L.; Chave, Jerome; Sabatier, Daniel; Duque, Alvaro; Molino, Jean-François; Prévost, Marie-Françoise; Spichiger, Rodolphe; Castellanos, Hernán; von Hildebrand, Patricio; Vásquez, Rodolfo
2006-09-01
The world's greatest terrestrial stores of biodiversity and carbon are found in the forests of northern South America, where large-scale biogeographic patterns and processes have recently begun to be described. Seven of the nine countries with territory in the Amazon basin and the Guiana shield have carried out large-scale forest inventories, but such massive data sets have been little exploited by tropical plant ecologists. Although forest inventories often lack the species-level identifications favoured by tropical plant ecologists, their consistency of measurement and vast spatial coverage make them ideally suited for numerical analyses at large scales, and a valuable resource to describe the still poorly understood spatial variation of biomass, diversity, community composition and forest functioning across the South American tropics. Here we show, by using the seven forest inventories complemented with trait and inventory data collected elsewhere, two dominant gradients in tree composition and function across the Amazon, one paralleling a major gradient in soil fertility and the other paralleling a gradient in dry season length. The data set also indicates that the dominance of Fabaceae in the Guiana shield is not necessarily the result of root adaptations to poor soils (nodulation or ectomycorrhizal associations) but perhaps also the result of their remarkably high seed mass there as a potential adaptation to low rates of disturbance.
Efficient Parallel Algorithms for Landscape Evolution Modelling
NASA Astrophysics Data System (ADS)
Moresi, L. N.; Mather, B.; Beucher, R.
2017-12-01
Landscape erosion and the deposition of sediments by river systems are strongly controlled bytopography, rainfall patterns, and the susceptibility of the basement to the action ofrunning water. It is well understood that each of these processes depends on the other, for example:topography results from active tectonic processes; deformation, metamorphosis andexhumation alter the competence of the basement; rainfall patterns depend on topography;uplift and subsidence in response to tectonic stress can be amplified by erosionand sediment deposition. We typically gain understanding of such coupled systems through forward models which capture theessential interactions of the various components and attempt parameterise those parts of the individual systemthat are unresolvable at the scale of the interaction. Here we address the problem of predicting erosion and deposition rates at a continental scalewith a resolution of tens to hundreds of metres in a dynamic, Lagrangian framework. This isa typical requirement for a code to interface with a mantle / lithosphere dynamics model anddemands an efficient, unstructured, parallel implementation. We address this through a very general algorithm that treats all parts of the landscape evolution equationsin sparse-matrix form including those for stream-flow accumulation, dam-filling and catchment determination. This givesus considerable flexibility in developing unstructured, parallel code, and in creating a modular packagethat can be configured by users to work at different temporal and spatial scales, but is also has potential advantagesin treating the non-linear parts of the problem in a general manner.
Polarization Radiation with Turbulent Magnetic Fields from X-Ray Binaries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jian-Fu; Xiang, Fu-Yuan; Lu, Ju-Fu, E-mail: jfzhang@xtu.edu.cn, E-mail: fyxiang@xtu.edu.cn, E-mail: lujf@xmu.edu.cn
2017-02-10
We study the properties of polarized radiation in turbulent magnetic fields from X-ray binary jets. These turbulent magnetic fields are composed of large- and small-scale configurations, which result in the polarized jitter radiation when the characteristic length of turbulence is less than the non-relativistic Larmor radius. On the contrary, the polarized synchrotron emission occurs, corresponding to a large-scale turbulent environment. We calculate the spectral energy distributions and the degree of polarization for a general microquasar. Numerical results show that turbulent magnetic field configurations can indeed provide a high degree of polarization, which does not mean that a uniform, large-scale magneticmore » field structure exists. The model is applied to investigate the properties of polarized radiation of the black-hole X-ray binary Cygnus X-1. Under the constraint of multiband observations of this source, our studies demonstrate that the model can explain the high polarization degree at the MeV tail and predict the highly polarized properties at the high-energy γ -ray region, and that the dominant small-scale turbulent magnetic field plays an important role for explaining the highly polarized observation at hard X-ray/soft γ -ray bands. This model can be tested by polarization observations of upcoming polarimeters at high-energy γ -ray bands.« less
NASA Technical Reports Server (NTRS)
Sutliff, Daniel, L.; Brown, Clifford, A.; Walker, Bruce, E.
2012-01-01
An Ultrasonic Configurable Fan Artificial Noise Source (UCFANS) was designed, built, and tested in support of the Langley Research Center s 14- by 22-Foot wind tunnel test of the Hybrid Wing Body (HWB) full three-dimensional 5.8 percent scale model. The UCFANS is a 5.8 percent rapid prototype scale model of a high-bypass turbofan engine that can generate the tonal signature of candidate engines using artificial sources (no flow). The purpose of the test was to provide an estimate of the acoustic shielding benefits possible from mounting the engine on the upper surface of an HWB aircraft and to provide a database for shielding code validation. A range of frequencies, and a parametric study of modes were generated from exhaust and inlet nacelle configurations. Radiated acoustic data were acquired from a traversing linear array of 13 microphones, spanning 36 in. Two planes perpendicular to the axis of the nacelle (in its 0 orientation) and three planes parallel were acquired from the array sweep. In each plane the linear array traversed five sweeps, for a total span of 160 in. acquired. The resolution of the sweep is variable, so that points closer to the model are taken at a higher resolution. Contour plots of Sound Pressure Level, and integrated Power Levels are presented in this paper; as well as the in-duct modal structure.
Exploratory flow visualization investigation of mast-mounted sights in presence of a rotor
NASA Technical Reports Server (NTRS)
Ghee, Terence A.; Kelley, Henry L.
1995-01-01
A flow visualization investigation with a laser light sheet system was conducted on a 27-percent-scale AH-64 attack helicopter model fitted with two mast-mounted sights in the langley 14- by 22-foot subsonic tunnel. The investigation was conducted to identify aerodynamic phenomena that may have contributed to adverse vibration encountered during full-scale flight of the AH-64D apache/longbow helicopter with an asymmetric mast-mounted sight. Symmetric and asymmetric mast-mounted sights oriented at several skew angles were tested at simulated forward and rearward flight speeds of 30 and 45 knots. A laser light sheet system was used to visualize the flow in planes parallel to and perpendicular to the free-stream flow. Analysis of these flow visualization data identified frequencies of flow patterns in the wake shed from the sight, the streamline angle at the sight, and the location where the shed wake crossed the rotor plane. Differences in wake structure were observed between the sight configurations and various skew angles. Analysis of lateral light sheet plane data implied significant vortex structure in the wake of the asymmetric mast-mounted sight in the configuration that produced maximum in-flight vibration. The data showed no significant vortex structure in the wake of the asymmetric and symmetric configurations that produced no increase in in-flight adverse vibration.
NASA Astrophysics Data System (ADS)
Cowley, Adam; Maynes, Daniel; Crockett, Julie; Iverson, Brian
2017-11-01
This work experimentally investigates the effects of heating on laminar flow in high aspect ratio superhydrophobic (SH) microchannels. When water that is saturated with dissolved air is used, the unwetted cavities of the SH surfaces act as nucleation sites and air effervesces out of solution onto the surfaces. The microchannels consist of a rib/cavity structured SH surface, that is heated, and a glass surface that is utilized for flow visualization. Two channel heights of nominally 183 and 366 μm are considered. The friction factor-Reynolds product (fRe) is obtained via pressure drop and volumetric flow rate measurements and the temperature profile along the channel is obtained via thermocouples embedded in an aluminum block below the SH surface. Five surface types/configurations are investigated: smooth hydrophilic, smooth hydrophobic, SH with ribs perpendicular to the flow, SH with ribs parallel to the flow, and SH with both ribs parallel to the flow and sparse ribs perpendicular to the flow. Depending on the surface type/configuration, large bubbles can form and adversely affect fRe and lead to higher temperatures along the channel. Once bubbles grow large enough, they are expelled from the channel. The channel size greatly effects the residence time of the bubbles and consequently fRe and the channel temperature. This research was supported by the National Science Foundation (NSF) (Grant No. CBET-1235881) and the Utah NASA Space Grant Consortium (NASA Grant NNX15A124H).
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Griebel, M., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de; Rüttgers, A., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de
The multiscale FENE model is applied to a 3D square-square contraction flow problem. For this purpose, the stochastic Brownian configuration field method (BCF) has been coupled with our fully parallelized three-dimensional Navier-Stokes solver NaSt3DGPF. The robustness of the BCF method enables the numerical simulation of high Deborah number flows for which most macroscopic methods suffer from stability issues. The results of our simulations are compared with that of experimental measurements from literature and show a very good agreement. In particular, flow phenomena such as a strong vortex enhancement, streamline divergence and a flow inversion for highly elastic flows are reproduced.more » Due to their computational complexity, our simulations require massively parallel computations. Using a domain decomposition approach with MPI, the implementation achieves excellent scale-up results for up to 128 processors.« less
Overview and recent progress of the Magnetized Shock Experiment (MSX)
NASA Astrophysics Data System (ADS)
Weber, T. E.; Intrator, T. P.; Smith, R. J.; Hutchinson, T. M.; Boguski, J. C.; Sears, J. A.; Swan, H. O.; Gao, K. W.; Chapdelaine, L. J.; Winske, D.; Dunn, J. P.
2013-10-01
The Magnetized Shock Experiment (MSX) has been constructed to study the physics of super-Alfvènic, supercritical, magnetized shocks. Exhibiting transitional length and time scales much smaller than can be produced through collisional processes, these shocks are observed to create non-thermal distributions, amplify magnetic fields, and accelerate particles to relativistic velocities. Shocks are produced through the acceleration and subsequent stagnation of Field Reversed Configuration (FRC) plasmoids against a high-flux magnetic mirror with a conducting boundary or a plasma target with embedded field. Adjustable shock velocity, density, and magnetic geometry (B parallel, perpendicular, or oblique to k) provide unique access to a wide range of dimensionless parameters relevant to astrophysical shocks. Information regarding the experimental configuration, diagnostics suite, recent simulations, experimental results, and physics goals will be presented. This work is supported by DOE OFES and NNSA under LANS contract DE-AC52-06NA25369 Approved for Public Release: LA-UR-13-24859.
NASA Astrophysics Data System (ADS)
Zhong, Hua; Zhang, Song; Hu, Jian; Sun, Minhong
2017-12-01
This paper deals with the imaging problem for one-stationary bistatic synthetic aperture radar (BiSAR) with high-squint, large-baseline configuration. In this bistatic configuration, accurate focusing of BiSAR data is a difficult issue due to the relatively large range cell migration (RCM), severe range-azimuth coupling, and inherent azimuth-geometric variance. To circumvent these issues, an enhanced azimuth nonlinear chirp scaling (NLCS) algorithm based on an ellipse model is investigated in this paper. In the range processing, a method combining deramp operation and keystone transform (KT) is adopted to remove linear RCM completely and mitigate range-azimuth cross-coupling. In the azimuth focusing, an ellipse model is established to analyze and depict the characteristic of azimuth-variant Doppler phase. Based on the new model, an enhanced azimuth NLCS algorithm is derived to focus one-stationary BiSAR data. Simulating results exhibited at the end of this paper validate the effectiveness of the proposed algorithm.
Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.
Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius
2016-10-01
Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.
NASA Technical Reports Server (NTRS)
Sutliff, Daniel l.; Brown, Clifford A.; Walker, Bruce E.
2014-01-01
An Ultrasonic Configurable Fan Artificial Noise Source (UCFANS) was designed, built, and tested in support of the NASA Langley Research Center's 14- by 22-ft wind tunnel test of the Hybrid Wing Body (HWB) full 3-D 5.8 percent scale model. The UCFANS is a 5.8 percent rapid prototype scale model of a high-bypass turbofan engine that can generate the tonal signature of proposed engines using artificial sources (no flow). The purpose of the test was to provide an estimate of the acoustic shielding benefits possible from mounting the engine on the upper surface of an HWB aircraft using the projected signature of the engine currently proposed for the HWB. The modal structures at the rating points were generated from inlet and exhaust nacelle configurations--a flat plate model was used as the shielding surface and vertical control surfaces with correct plan form shapes were also tested to determine their additional impact on shielding. Radiated acoustic data were acquired from a traversing linear array of 13 microphones, spanning 36 in. Two planes perpendicular, and two planes parallel, to the axis of the nacelle were acquired from the array sweep. In each plane the linear array traversed four sweeps, for a total span of 168 in. acquired. The resolution of the sweep is variable, so that points closer to the model are taken at a higher resolution. Contour plots of Sound Pressure Levels, and integrated Power Levels, from nacelle alone and shielded configurations are presented in this paper; as well as the in-duct mode power levels
NASA Technical Reports Server (NTRS)
Sutliff, Daniel L.; Brown, Cliff; Walker, Bruce E.
2014-01-01
An Ultrasonic Configurable Fan Artificial Noise Source (UCFANS) was designed, built, and tested in support of the NASA Langley Research Center's 14x22 wind tunnel test of the Hybrid Wing Body (HWB) full 3-D 5.8% scale model. The UCFANS is a 5.8% rapid prototype scale model of a high-bypass turbofan engine that can generate the tonal signature of proposed engines using artificial sources (no flow). The purpose of the test was to provide an estimate of the acoustic shielding benefits possible from mounting the engine on the upper surface of an HWB aircraft using the projected signature of the engine currently proposed for the HWB. The modal structures at the rating points were generated from inlet and exhaust nacelle configurations - a flat plate model was used as the shielding surface and vertical control surfaces with correct plan form shapes were also tested to determine their additional impact on shielding. Radiated acoustic data were acquired from a traversing linear array of 13 microphones, spanning 36 inches. Two planes perpendicular, and two planes parallel, to the axis of the nacelle were acquired from the array sweep. In each plane the linear array traversed 4 sweeps, for a total span of 168 inches acquired. The resolution of the sweep is variable, so that points closer to the model are taken at a higher resolution. Contour plots of Sound Pressure Levels, and integrated Power Levels, from nacelle alone and shielded configurations are presented in this paper; as well as the in-duct mode power levels.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
Parallel Tensor Compression for Large-Scale Scientific Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Parallel scalability of Hartree-Fock calculations
NASA Astrophysics Data System (ADS)
Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.
2015-03-01
Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
NASA Astrophysics Data System (ADS)
Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro
2016-08-01
We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook
NASA Astrophysics Data System (ADS)
Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.
2012-12-01
The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Xyce parallel electronic simulator users guide, version 6.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Reverse engineering and analysis of large genome-scale gene networks
Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas
2013-01-01
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249
Performance Evaluation in Network-Based Parallel Computing
NASA Technical Reports Server (NTRS)
Dezhgosha, Kamyar
1996-01-01
Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Argonne Simulation Framework for Intelligent Transportation Systems
DOT National Transportation Integrated Search
1996-01-01
A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...
Kim, Jung Eun; Phuntsho, Sherub; Ali, Syed Muztuza; Choi, Joon Young; Shon, Ho Kyong
2018-01-01
This study evaluates various options for full-scale modular configuration of forward osmosis (FO) process for osmotic dilution of seawater using wastewater for simultaneous desalination and water reuse through FO-reverse osmosis (RO) hybrid system. Empirical relationship obtained from one FO membrane element operation was used to simulate the operational performances of different FO module configurations. The main limiting criteria for module operation is to always maintain the feed pressure higher than the draw pressure throughout the housing module for safe operation without affecting membrane integrity. Experimental studies under the conditions tested in this study show that a single membrane housing cannot accommodate more than four elements as the draw pressure exceeds the feed pressure. This then indicates that a single stage housing with eight elements is not likely to be practical for safe FO operation. Hence, six different FO modular configurations were proposed and simulated. A two-stage FO configuration with multiple housings (in parallel) in the second stage using same or larger spacer thickness reduces draw pressure build-up as the draw flow rates are reduced to half in the second stage thereby allowing more than four elements in the second stage housing. The loss of feed pressure (pressure drop) and osmotic driving force in the second stage are compensated by operating under the pressure assisted osmosis (PAO) mode, which helps enhance permeate flux and maintains positive pressure differences between the feed and draw chamber. The PAO energy penalty is compensated by enhanced permeate throughput, reduced membrane area, and plant footprint. The contribution of FO/PAO to total energy consumption was not significant compared to post RO desalination (90%) indicating that the proposed two-stage FO modular configuration is one way of making the FO full-scale operation practical for FO-RO hybrid system. Copyright © 2017 Elsevier Ltd. All rights reserved.
Astrophysical N-body Simulations Using Hierarchical Tree Data Structures
NASA Astrophysics Data System (ADS)
Warren, M. S.; Salmon, J. K.
The authors report on recent large astrophysical N-body simulations executed on the Intel Touchstone Delta system. They review the astrophysical motivation and the numerical techniques and discuss steps taken to parallelize these simulations. The methods scale as O(N log N), for large values of N, and also scale linearly with the number of processors. The performance sustained for a duration of 67 h, was between 5.1 and 5.4 Gflop/s on a 512-processor system.
Optimization study for the experimental configuration of CMB-S4
NASA Astrophysics Data System (ADS)
Barron, Darcy; Chinone, Yuji; Kusaka, Akito; Borril, Julian; Errard, Josquin; Feeney, Stephen; Ferraro, Simone; Keskitalo, Reijo; Lee, Adrian T.; Roe, Natalie A.; Sherwin, Blake D.; Suzuki, Aritoki
2018-02-01
The CMB Stage 4 (CMB-S4) experiment is a next-generation, ground-based experiment that will measure the cosmic microwave background (CMB) polarization to unprecedented accuracy, probing the signature of inflation, the nature of cosmic neutrinos, relativistic thermal relics in the early universe, and the evolution of the universe. CMB-S4 will consist of O(500,000) photon-noise-limited detectors that cover a wide range of angular scales in order to probe the cosmological signatures from both the early and late universe. It will measure a wide range of microwave frequencies to cleanly separate the CMB signals from galactic and extra-galactic foregrounds. To advance the progress towards designing the instrument for CMB-S4, we have established a framework to optimize the instrumental configuration to maximize its scientific output. The framework combines cost and instrumental models with a cosmology forecasting tool, and evaluates the scientific sensitivity as a function of various instrumental parameters. The cost model also allows us to perform the analysis under a fixed-cost constraint, optimizing for the scientific output of the experiment given finite resources. In this paper, we report our first results from this framework, using simplified instrumental and cost models. We have primarily studied two classes of instrumental configurations: arrays of large-aperture telescopes with diameters ranging from 2–10 m, and hybrid arrays that combine small-aperture telescopes (0.5-m diameter) with large-aperture telescopes. We explore performance as a function of telescope aperture size, distribution of the detectors into different microwave frequencies, survey strategy and survey area, low-frequency noise performance, and balance between small and large aperture telescopes for hybrid configurations. Both types of configurations must cover both large (~ degree) and small (~ arcmin) angular scales, and the performance depends on assumptions for performance vs. angular scale. The configurations with large-aperture telescopes have a shallow optimum around 4–6 m in aperture diameter, assuming that large telescopes can achieve good performance for low-frequency noise. We explore some of the uncertainties of the instrumental model and cost parameters, and we find that the optimum has a weak dependence on these parameters. The hybrid configuration shows an even broader optimum, spanning a range of 4–10 m in aperture for the large telescopes. We also present two strawperson configurations as an outcome of this optimization study, and we discuss some ideas for improving our simple cost and instrumental models used here. There are several areas of this analysis that deserve further improvement. In our forecasting framework, we adopt a simple two-component foreground model with spatially varying power-law spectral indices. We estimate de-lensing performance statistically and ignore non-idealities such as anisotropic mode coverage, boundary effect, and possible foreground residual. Instrumental systematics, which is not accounted for in our analyses, may also influence the conceptual design. Further study of the instrumental and cost models will be one of the main areas of study by the entire CMB-S4 community. We hope that our framework will be useful for estimating the influence of these improvements in the future, and we will incorporate them in order to further improve the optimization.
Formation of collisionless shocks in magnetized plasma interaction with kinetic-scale obstacles
Cruz, F.; Alves, E. P.; Bamford, R. A.; ...
2017-02-06
We investigate the formation of collisionless magnetized shocks triggered by the interaction between magnetized plasma flows and miniature-sized (order of plasma kinetic-scales) magnetic obstacles resorting to massively parallel, full particle-in-cell simulations, including the electron kinetics. The critical obstacle size to generate a compressed plasma region ahead of these objects is determined by independently varying the magnitude of the dipolar magnetic moment and the plasma magnetization. Here we find that the effective size of the obstacle depends on the relative orientation between the dipolar and plasma internal magnetic fields, and we show that this may be critical to form a shockmore » in small-scale structures. We also study the microphysics of the magnetopause in different magnetic field configurations in 2D and compare the results with full 3D simulations. Finally, we evaluate the parameter range where such miniature magnetized shocks can be explored in laboratory experiments.« less
Evaluation of fault-tolerant parallel-processor architectures over long space missions
NASA Technical Reports Server (NTRS)
Johnson, Sally C.
1989-01-01
The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Impact of the coupling effect and the configuration on a compact rectenna array
NASA Astrophysics Data System (ADS)
Rivière, J.; Douyere, A.; Luk, J. D. Lan Sun
2014-10-01
This paper proposes an experimental study of the coupling effect of a rectenna array. The rectifying antenna consists of a compact and efficient rectifying circuit in a series topology, coupled with a small metamaterial-inspired antenna. The measurements are investigated in the X plane on the rectenna array's behavior, with series and parallel DC- combining configuration of two and three spaced rectennas from 3 cm to 10 cm. This study shows that the maximum efficiency is reached for the series configuration, with a resistive load of 10 kQ. The optimal distance is not significant for series or parallel configuration. Then, a comparison between a rectenna array with non-optimal mutual coupling and a more traditional patch rectenna is performed. Finally, a practical application is tested to demonstrate the effectiveness of such small rectenna array.
Parallel Computation of the Regional Ocean Modeling System (ROMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, P; Song, Y T; Chao, Y
2005-04-05
The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis
2012-01-01
Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.
Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong
2012-01-25
The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.
Zhou, Juntuo; Liu, Huiying; Liu, Yang; Liu, Jia; Zhao, Xuyang; Yin, Yuxin
2016-04-19
Recent advances in mass spectrometers which have yielded higher resolution and faster scanning speeds have expanded their application in metabolomics of diverse diseases. Using a quadrupole-Orbitrap LC-MS system, we developed an efficient large-scale quantitative method targeting 237 metabolites involved in various metabolic pathways using scheduled, parallel reaction monitoring (PRM). We assessed the dynamic range, linearity, reproducibility, and system suitability of the PRM assay by measuring concentration curves, biological samples, and clinical serum samples. The quantification performances of PRM and MS1-based assays in Q-Exactive were compared, and the MRM assay in QTRAP 6500 was also compared. The PRM assay monitoring 237 polar metabolites showed greater reproducibility and quantitative accuracy than MS1-based quantification and also showed greater flexibility in postacquisition assay refinement than the MRM assay in QTRAP 6500. We present a workflow for convenient PRM data processing using Skyline software which is free of charge. In this study we have established a reliable PRM methodology on a quadrupole-Orbitrap platform for evaluation of large-scale targeted metabolomics, which provides a new choice for basic and clinical metabolomics study.
NASA Astrophysics Data System (ADS)
Mizyuk, Artem; Senderov, Maxim; Korotaev, Gennady
2016-04-01
Large number of numerical ocean models were implemented for the Black Sea basin during last two decades. They reproduce rather similar structure of synoptical variability of the circulation. Since 00-s numerical studies of the mesoscale structure are carried out using high performance computing (HPC). With the growing capacity of computing resources it is now possible to reconstruct the Black Sea currents with spatial resolution of several hundreds meters. However, how realistic these results can be? In the proposed study an attempt is made to understand which spatial scales are reproduced by ocean model in the Black Sea. Simulations are made using parallel version of NEMO (Nucleus for European Modelling of the Ocean). A two regional configurations with spatial resolutions 5 km and 2.5 km are described. Comparison of the SST from simulations with two spatial resolutions shows rather qualitative difference of the spatial structures. Results of high resolution simulation are compared also with satellite observations and observation-based products from Copernicus using spatial correlation and spectral analysis. Spatial scales of correlations functions for simulated and observed SST are rather close and differs much from satellite SST reanalysis. Evolution of spectral density for modelled SST and reanalysis showed agreed time periods of small scales intensification. Using of the spectral analysis for satellite measurements is complicated due to gaps. The research leading to this results has received funding from Russian Science Foundation (project № 15-17-20020)
A Multiscale Parallel Computing Architecture for Automated Segmentation of the Brain Connectome
Knobe, Kathleen; Newton, Ryan R.; Schlimbach, Frank; Blower, Melanie; Reid, R. Clay
2015-01-01
Several groups in neurobiology have embarked into deciphering the brain circuitry using large-scale imaging of a mouse brain and manual tracing of the connections between neurons. Creating a graph of the brain circuitry, also called a connectome, could have a huge impact on the understanding of neurodegenerative diseases such as Alzheimer’s disease. Although considerably smaller than a human brain, a mouse brain already exhibits one billion connections and manually tracing the connectome of a mouse brain can only be achieved partially. This paper proposes to scale up the tracing by using automated image segmentation and a parallel computing approach designed for domain experts. We explain the design decisions behind our parallel approach and we present our results for the segmentation of the vasculature and the cell nuclei, which have been obtained without any manual intervention. PMID:21926011
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-05
... tray configurations. The experiments conducted range from micro-scale, in which very small (5 mg... burned under a large oxygen- depletion calorimeter. Other experiments include cone calorimetry, smoke and... tray of cables underneath a bank of radiant panels. The results of the small-scale experiments are to...
How does an asymmetric magnetic field change the vertical structure of a hot accretion flow?
NASA Astrophysics Data System (ADS)
Samadi, M.; Abbassi, S.; Lovelace, R. V. E.
2017-09-01
This paper explores the effects of large-scale magnetic fields in hot accretion flows for asymmetric configurations with respect to the equatorial plane. The solutions that we have found show that the large-scale asymmetric magnetic field can significantly affect the dynamics of the flow and also cause notable outflows in the outer parts. Previously, we treated a viscous resistive accreting disc in the presence of an odd symmetric B-field about the equatorial plane. Now, we extend our earlier work by taking into account another configuration of large-scale magnetic field that is no longer symmetric. We provide asymmetric field structures with small deviations from even and odd symmetric B-field. Our results show that the disc's dynamics and appearance become different above and below the equatorial plane. The set of solutions also predicts that even a small deviation in a symmetric field causes the disc to compress on one side and expand on the other. In some cases, our solution represents a very strong outflow from just one side of the disc. Therefore, the solution may potentially explain the origin of one-sided jets in radio galaxies.
Aeroelastic Stability Investigations for Large-scale Vertical Axis Wind Turbines
NASA Astrophysics Data System (ADS)
Owens, B. C.; Griffith, D. T.
2014-06-01
The availability of offshore wind resources in coastal regions, along with a high concentration of load centers in these areas, makes offshore wind energy an attractive opportunity for clean renewable electricity production. High infrastructure costs such as the offshore support structure and operation and maintenance costs for offshore wind technology, however, are significant obstacles that need to be overcome to make offshore wind a more cost-effective option. A vertical-axis wind turbine (VAWT) rotor configuration offers a potential transformative technology solution that significantly lowers cost of energy for offshore wind due to its inherent advantages for the offshore market. However, several potential challenges exist for VAWTs and this paper addresses one of them with an initial investigation of dynamic aeroelastic stability for large-scale, multi-megawatt VAWTs. The aeroelastic formulation and solution method from the BLade Aeroelastic STability Tool (BLAST) for HAWT blades was employed to extend the analysis capability of a newly developed structural dynamics design tool for VAWTs. This investigation considers the effect of configuration geometry, material system choice, and number of blades on the aeroelastic stability of a VAWT, and provides an initial scoping for potential aeroelastic instabilities in large-scale VAWT designs.
TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks
DOE Office of Scientific and Technical Information (OSTI.GOV)
French, S; Nazareth, D; Bellor, M
Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less
Multitasking for flows about multiple body configurations using the chimera grid scheme
NASA Technical Reports Server (NTRS)
Dougherty, F. C.; Morgan, R. L.
1987-01-01
The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
Implementation of Parallel Computing Technology to Vortex Flow
NASA Technical Reports Server (NTRS)
Dacles-Mariani, Jennifer
1999-01-01
Mainframe supercomputers such as the Cray C90 was invaluable in obtaining large scale computations using several millions of grid points to resolve salient features of a tip vortex flow over a lifting wing. However, real flight configurations require tracking not only of the flow over several lifting wings but its growth and decay in the near- and intermediate- wake regions, not to mention the interaction of these vortices with each other. Resolving and tracking the evolution and interaction of these vortices shed from complex bodies is computationally intensive. Parallel computing technology is an attractive option in solving these flows. In planetary science vortical flows are also important in studying how planets and protoplanets form when cosmic dust and gases become gravitationally unstable and eventually form planets or protoplanets. The current paradigm for the formation of planetary systems maintains that the planets accreted from the nebula of gas and dust left over from the formation of the Sun. Traditional theory also indicate that such a preplanetary nebula took the form of flattened disk. The coagulation of dust led to the settling of aggregates toward the midplane of the disk, where they grew further into asteroid-like planetesimals. Some of the issues still remaining in this process are the onset of gravitational instability, the role of turbulence in the damping of particles and radial effects. In this study the focus will be with the role of turbulence and the radial effects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carrascosa, M.; García-Cabañes, A.; Jubera, M.
The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO{sub 3} substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectricmore » and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.« less
Detonation wave detection probe including parallel electrodes on a flexible backing strip
Uher, K.J.
1995-12-19
A device is disclosed for sensing the occurrence of destructive events and events involving mechanical shock in a non-intrusive manner. A pair of electrodes is disposed in a parallel configuration on a backing strip of flexible film. Electrical circuitry is used to sense the time at which an event causes electrical continuity between the electrodes or, with a sensor configuration where the electrodes are shorted together, to sense the time at which electrical continuity is lost. 4 figs.
Using Agent Base Models to Optimize Large Scale Network for Large System Inventories
NASA Technical Reports Server (NTRS)
Shameldin, Ramez Ahmed; Bowling, Shannon R.
2010-01-01
The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.
Climate Modeling with a Million CPUs
NASA Astrophysics Data System (ADS)
Tobis, M.; Jackson, C. S.
2010-12-01
Michael Tobis, Ph.D. Research Scientist Associate University of Texas Institute for Geophysics Charles S. Jackson Research Scientist University of Texas Institute for Geophysics Meteorological, oceanographic, and climatological applications have been at the forefront of scientific computing since its inception. The trend toward ever larger and more capable computing installations is unabated. However, much of the increase in capacity is accompanied by an increase in parallelism and a concomitant increase in complexity. An increase of at least four additional orders of magnitude in the computational power of scientific platforms is anticipated. It is unclear how individual climate simulations can continue to make effective use of the largest platforms. Conversion of existing community codes to higher resolution, or to more complex phenomenology, or both, presents daunting design and validation challenges. Our alternative approach is to use the expected resources to run very large ensembles of simulations of modest size, rather than to await the emergence of very large simulations. We are already doing this in exploring the parameter space of existing models using the Multiple Very Fast Simulated Annealing algorithm, which was developed for seismic imaging. Our experiments have the dual intentions of tuning the model and identifying ranges of parameter uncertainty. Our approach is less strongly constrained by the dimensionality of the parameter space than are competing methods. Nevertheless, scaling up remains costly. Much could be achieved by increasing the dimensionality of the search and adding complexity to the search algorithms. Such ensemble approaches scale naturally to very large platforms. Extensions of the approach are anticipated. For example, structurally different models can be tuned to comparable effectiveness. This can provide an objective test for which there is no realistic precedent with smaller computations. We find ourselves inventing new code to manage our ensembles. Component computations involve tens to hundreds of CPUs and tens to hundreds of hours. The results of these moderately large parallel jobs influence the scheduling of subsequent jobs, and complex algorithms may be easily contemplated for this. The operating system concept of a "thread" re-emerges at a very coarse level, where each thread manages atomic computations of thousands of CPU-hours. That is, rather than multiple threads operating on a processor, at this level, multiple processors operate within a single thread. In collaboration with the Texas Advanced Computing Center, we are developing a software library at the system level, which should facilitate the development of computations involving complex strategies which invoke large numbers of moderately large multi-processor jobs. While this may have applications in other sciences, our key intent is to better characterize the coupled behavior of a very large set of climate model configurations.
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel-In-Time For Moving Meshes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falgout, R. D.; Manteuffel, T. A.; Southworth, B.
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less
Modelling the large-scale redshift-space 3-point correlation function of galaxies
NASA Astrophysics Data System (ADS)
Slepian, Zachary; Eisenstein, Daniel J.
2017-08-01
We present a configuration-space model of the large-scale galaxy 3-point correlation function (3PCF) based on leading-order perturbation theory and including redshift-space distortions (RSD). This model should be useful in extracting distance-scale information from the 3PCF via the baryon acoustic oscillation method. We include the first redshift-space treatment of biasing by the baryon-dark matter relative velocity. Overall, on large scales the effect of RSD is primarily a renormalization of the 3PCF that is roughly independent of both physical scale and triangle opening angle; for our adopted Ωm and bias values, the rescaling is a factor of ˜1.8. We also present an efficient scheme for computing 3PCF predictions from our model, important for allowing fast exploration of the space of cosmological parameters in future analyses.
Wind Tunnel Testing of a 120th Scale Large Civil Tilt-Rotor Model in Airplane and Helicopter Modes
NASA Technical Reports Server (NTRS)
Theodore, Colin R.; Willink, Gina C.; Russell, Carl R.; Amy, Alexander R.; Pete, Ashley E.
2014-01-01
In April 2012 and October 2013, NASA and the U.S. Army jointly conducted a wind tunnel test program examining two notional large tilt rotor designs: NASA's Large Civil Tilt Rotor and the Army's High Efficiency Tilt Rotor. The approximately 6%-scale airframe models (unpowered) were tested without rotors in the U.S. Army 7- by 10-foot wind tunnel at NASA Ames Research Center. Measurements of all six forces and moments acting on the airframe were taken using the wind tunnel scale system. In addition to force and moment measurements, flow visualization using tufts, infrared thermography and oil flow were used to identify flow trajectories, boundary layer transition and areas of flow separation. The purpose of this test was to collect data for the validation of computational fluid dynamics tools, for the development of flight dynamics simulation models, and to validate performance predictions made during conceptual design. This paper focuses on the results for the Large Civil Tilt Rotor model in an airplane mode configuration up to 200 knots of wind tunnel speed. Results are presented with the full airframe model with various wing tip and nacelle configurations, and for a wing-only case also with various wing tip and nacelle configurations. Key results show that the addition of a wing extension outboard of the nacelles produces a significant increase in the lift-to-drag ratio, and interestingly decreases the drag compared to the case where the wing extension is not present. The drag decrease is likely due to complex aerodynamic interactions between the nacelle and wing extension that results in a significant drag benefit.
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model
NASA Astrophysics Data System (ADS)
Kumar, M.; Duffy, C.
2006-05-01
Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
NASA Technical Reports Server (NTRS)
Cassell, Alan M.
2013-01-01
The testing of 3- and 6-meter diameter Hypersonic Inflatable Aerodynamic Decelerator (HIAD) test articles was completed in the National Full-Scale Aerodynamics Complex 40 ft x 80 ft Wind Tunnel test section. Both models were stacked tori, constructed as 60 degree half-angle sphere cones. The 3-meter HIAD was tested in two configurations. The first 3-meter configuration utilized an instrumented flexible aerodynamic skin covering the inflatable aeroshell surface, while the second configuration employed a flight-like flexible thermal protection system. The 6-meter HIAD was tested in two structural configurations (with and without an aft-mounted stiffening torus near the shoulder), both utilizing an instrumented aerodynamic skin.
A Discretization Algorithm for Meteorological Data and its Parallelization Based on Hadoop
NASA Astrophysics Data System (ADS)
Liu, Chao; Jin, Wen; Yu, Yuting; Qiu, Taorong; Bai, Xiaoming; Zou, Shuilong
2017-10-01
In view of the large amount of meteorological observation data, the property is more and the attribute values are continuous values, the correlation between the elements is the need for the application of meteorological data, this paper is devoted to solving the problem of how to better discretize large meteorological data to more effectively dig out the hidden knowledge in meteorological data and research on the improvement of discretization algorithm for large scale data, in order to achieve data in the large meteorological data discretization for the follow-up to better provide knowledge to provide protection, a discretization algorithm based on information entropy and inconsistency of meteorological attributes is proposed and the algorithm is parallelized under Hadoop platform. Finally, the comparison test validates the effectiveness of the proposed algorithm for discretization in the area of meteorological large data.
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Virtual Network Configuration Management System for Data Center Operations and Management
NASA Astrophysics Data System (ADS)
Okita, Hideki; Yoshizawa, Masahiro; Uehara, Keitaro; Mizuno, Kazuhiko; Tarui, Toshiaki; Naono, Ken
Virtualization technologies are widely deployed in data centers to improve system utilization. However, they increase the workload for operators, who have to manage the structure of virtual networks in data centers. A virtual-network management system which automates the integration of the configurations of the virtual networks is provided. The proposed system collects the configurations from server virtualization platforms and VLAN-supported switches, and integrates these configurations according to a newly developed XML-based management information model for virtual-network configurations. Preliminary evaluations show that the proposed system helps operators by reducing the time to acquire the configurations from devices and correct the inconsistency of operators' configuration management database by about 40 percent. Further, they also show that the proposed system has excellent scalability; the system takes less than 20 minutes to acquire the virtual-network configurations from a large scale network that includes 300 virtual machines. These results imply that the proposed system is effective for improving the configuration management process for virtual networks in data centers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offsmore » in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.« less
Controllable spin polarization and spin filtering in a zigzag silicene nanoribbon
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farokhnezhad, Mohsen, E-mail: Mohsen-farokhnezhad@physics.iust.ac.ir; Esmaeilzadeh, Mahdi, E-mail: mahdi@iust.ac.ir; Pournaghavi, Nezhat
2015-05-07
Using non-equilibrium Green's function, we study the spin-dependent electron transport properties in a zigzag silicene nanoribbon. To produce and control spin polarization, it is assumed that two ferromagnetic strips are deposited on the both edges of the silicene nanoribbon and an electric field is perpendicularly applied to the nanoribbon plane. The spin polarization is studied for both parallel and anti-parallel configurations of exchange magnetic fields induced by the ferromagnetic strips. We find that complete spin polarization can take place in the presence of perpendicular electric field for anti-parallel configuration and the nanoribbon can work as a perfect spin filter. Themore » spin direction of transmitted electrons can be easily changed from up to down and vice versa by reversing the electric field direction. For parallel configuration, perfect spin filtering can occur even in the absence of electric field. In this case, the spin direction can be changed by changing the electron energy. Finally, we investigate the effects of nonmagnetic Anderson disorder on spin dependent conductance and find that the perfect spin filtering properties of nanoribbon are destroyed by strong disorder, but the nanoribbon retains these properties in the presence of weak disorder.« less
NASA Astrophysics Data System (ADS)
Higashino, Satoru; Kobayashi, Shoei; Yamagami, Tamotsu
2007-06-01
High data transfer rate has been demanded for data storage devices along increasing the storage capacity. In order to increase the transfer rate, high-speed data processing techniques in read-channel devices are required. Generally, parallel architecture is utilized for the high-speed digital processing. We have developed a new architecture of Interpolated Timing Recovery (ITR) to achieve high-speed data transfer rate and wide capture-range in read-channel devices for the information storage channels. It facilitates the parallel implementation on large-scale-integration (LSI) devices.
Portable parallel portfolio optimization in the Aurora Financial Management System
NASA Astrophysics Data System (ADS)
Laure, Erwin; Moritsch, Hans
2001-07-01
Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.
NASA Astrophysics Data System (ADS)
Li, Wei; Alves, Tiago M.; Wu, Shiguo; Rebesco, Michele; Zhao, Fang; Mi, Lijun; Ma, Benjun
2016-10-01
A giant submarine creep zone exceeding 800 km2 on the continental slope offshore the Dongsha Islands, South China Sea, is investigated using bathymetric and 3D seismic data tied to borehole information. The submarine creep zone is identified as a wide area of seafloor undulations with ridges and troughs. The troughs form NW- and WNW-trending elongated depressions separating distinct seafloor ridges, which are parallel or sub-parallel to the continental slope. The troughs are 0.8-4.7 km-long and 0.4 to 2.1 km-wide. The ridges have wavelengths of 1-4 km and vertical relief of 10-30 m. Slope strata are characterised by the presence of vertically stacked ridges and troughs at different stratigraphic depths, but remaining relatively stationary in their position. The interpreted ridges and troughs are associated with large-scale submarine creep, and the troughs can be divided into three types based on their different internal characters and formation processes. The large-scale listric faults trending downslope below MTD 1 and horizon T0 may be the potential glide planes for the submarine creep movement. High sedimentation rates, local fault activity and the frequent earthquakes recorded on the margin are considered as the main factors controlling the formation of this giant submarine creep zone. Our results are important to the understanding of sediment instability on continental slopes as: a) the interpreted submarine creep is young, or even active at present, and b) areas of creeping may evolve into large-scale slope instabilities, as recorded by similar large-scale events in the past.
Twisted versus braided magnetic flux ropes in coronal geometry. II. Comparative behaviour
NASA Astrophysics Data System (ADS)
Prior, C.; Yeates, A. R.
2016-06-01
Aims: Sigmoidal structures in the solar corona are commonly associated with magnetic flux ropes whose magnetic field lines are twisted about a mutual axis. Their dynamical evolution is well studied, with sufficient twisting leading to large-scale rotation (writhing) and vertical expansion, possibly leading to ejection. Here, we investigate the behaviour of flux ropes whose field lines have more complex entangled/braided configurations. Our hypothesis is that this internal structure will inhibit the large-scale morphological changes. Additionally, we investigate the influence of the background field within which the rope is embedded. Methods: A technique for generating tubular magnetic fields with arbitrary axial geometry and internal structure, introduced in part I of this study, provides the initial conditions for resistive-MHD simulations. The tubular fields are embedded in a linear force-free background, and we consider various internal structures for the tubular field, including both twisted and braided topologies. These embedded flux ropes are then evolved using a 3D MHD code. Results: Firstly, in a background where twisted flux ropes evolve through the expected non-linear writhing and vertical expansion, we find that flux ropes with sufficiently braided/entangled interiors show no such large-scale changes. Secondly, embedding a twisted flux rope in a background field with a sigmoidal inversion line leads to eventual reversal of the large-scale rotation. Thirdly, in some cases a braided flux rope splits due to reconnection into two twisted flux ropes of opposing chirality - a phenomenon previously observed in cylindrical configurations. Conclusions: Sufficiently complex entanglement of the magnetic field lines within a flux rope can suppress large-scale morphological changes of its axis, with magnetic energy reduced instead through reconnection and expansion. The structure of the background magnetic field can significantly affect the changing morphology of a flux rope.
NASA Astrophysics Data System (ADS)
Puzyrev, Vladimir; Torres-Verdín, Carlos; Calo, Victor
2018-05-01
The interpretation of resistivity measurements acquired in high-angle and horizontal wells is a critical technical problem in formation evaluation. We develop an efficient parallel 3-D inversion method to estimate the spatial distribution of electrical resistivity in the neighbourhood of a well from deep directional electromagnetic induction measurements. The methodology places no restriction on the spatial distribution of the electrical resistivity around arbitrary well trajectories. The fast forward modelling of triaxial induction measurements performed with multiple transmitter-receiver configurations employs a parallel direct solver. The inversion uses a pre-conditioned gradient-based method whose accuracy is improved using the Wolfe conditions to estimate optimal step lengths at each iteration. The large transmitter-receiver offsets, used in the latest generation of commercial directional resistivity tools, improve the depth of investigation to over 30 m from the wellbore. Several challenging synthetic examples confirm the feasibility of the full 3-D inversion-based interpretations for these distances, hence enabling the integration of resistivity measurements with seismic amplitude data to improve the forecast of the petrophysical and fluid properties. Employing parallel direct solvers for the triaxial induction problems allows for large reductions in computational effort, thereby opening the possibility to invert multiposition 3-D data in practical CPU times.
Hazards Due to Overdischarge in Lithium-ion Cylindrical Cells in Multi-cell Configurations
NASA Technical Reports Server (NTRS)
Jeevarajan, Judith; Strangways, Brad; Nelson, Tim
2010-01-01
Lithium-ion cells in the cylindrical Commercial-off-the-shelf 18650 design format were used to study the hazards associated with overdischarge. The cells in series or in parallel configurations were subjected to different conditions of overdischarge. The cells in parallel configurations were all overdischarged to 2.0 V for 75 cycles with one cell removed at 25 cycles to study the health of the cell. The cells in series were designed to be in an unbalanced configuration by discharging one cell in each series configuration before the start of test. The discharge consisted of removing a pre-determined capacity from the cell. This ranged from 50 to 150 mAh removal. The cells were discharged down to a predetermined end-of-discharge voltage cutoff which allowed the cell with lower capacity to go into an overdischarge mode. The cell modules that survived the 75 cycles were subjected to one overvoltage test to 4.4 V/cell.
Intrinsic plasma rotation and Reynolds stress at the plasma edge in the HSX stellarator
Wilcox, Robert S.; Talmadge, J. N.; Anderson, David T.; ...
2016-02-05
Using multi-tipped Langmuir probes in the edge of the HSX stellarator, the radial electric field and parallel flows are found to deviate from the values calculated by the neoclassical transport code PENTA for the optimized quasi-helically symmetric (QHS) configuration. To understand whether Reynolds stress might explain the discrepancy, fluctuating floating potential measurements are made at two locations in the torus corresponding to the low field and high field sides of the device. The measurements at the two locations show clear evidence of a gradient in the Reynolds stress. However, the resulting flow due to the gradient in the stress ismore » found to be large and in opposite directions for the two locations. This makes an estimation of the flux surface average using a small number of measurement locations impractical from an experimental perspective. These results neither confirm nor rule out whether Reynolds stress plays an important role for the QHS configuration. Measurements made in configurations with the quasi-symmetry degraded show even larger flows and greater deviations from the neoclassically calculated velocity profiles than the QHS configuration while the fluctuation magnitudes are reduced. Lastly, for these configurations in particular, the Reynolds stress is most likely not responsible for the additional momentum.« less