efficient computer code: Topics by Science.gov

Sample records for efficient computer code

Green's function methods in heavy ion shielding

NASA Technical Reports Server (NTRS)

Wilson, John W.; Costen, Robert C.; Shinn, Judy L.; Badavi, Francis F.

1993-01-01

An analytic solution to the heavy ion transport in terms of Green's function is used to generate a highly efficient computer code for space applications. The efficiency of the computer code is accomplished by a nonperturbative technique extending Green's function over the solution domain. The computer code can also be applied to accelerator boundary conditions to allow code validation in laboratory experiments.
Numerical algorithm comparison for the accurate and efficient computation of high-incidence vortical flow

NASA Technical Reports Server (NTRS)

Chaderjian, Neal M.

1991-01-01

Computations from two Navier-Stokes codes, NSS and F3D, are presented for a tangent-ogive-cylinder body at high angle of attack. Features of this steady flow include a pair of primary vortices on the leeward side of the body as well as secondary vortices. The topological and physical plausibility of this vortical structure is discussed. The accuracy of these codes are assessed by comparison of the numerical solutions with experimental data. The effects of turbulence model, numerical dissipation, and grid refinement are presented. The overall efficiency of these codes are also assessed by examining their convergence rates, computational time per time step, and maximum allowable time step for time-accurate computations. Overall, the numerical results from both codes compared equally well with experimental data, however, the NSS code was found to be significantly more efficient than the F3D code.
Computation of Reacting Flows in Combustion Processes

NASA Technical Reports Server (NTRS)

Keith, Theo G., Jr.; Chen, Kuo-Huey

1997-01-01

The main objective of this research was to develop an efficient three-dimensional computer code for chemically reacting flows. The main computer code developed is ALLSPD-3D. The ALLSPD-3D computer program is developed for the calculation of three-dimensional, chemically reacting flows with sprays. The ALL-SPD code employs a coupled, strongly implicit solution procedure for turbulent spray combustion flows. A stochastic droplet model and an efficient method for treatment of the spray source terms in the gas-phase equations are used to calculate the evaporating liquid sprays. The chemistry treatment in the code is general enough that an arbitrary number of reaction and species can be defined by the users. Also, it is written in generalized curvilinear coordinates with both multi-block and flexible internal blockage capabilities to handle complex geometries. In addition, for general industrial combustion applications, the code provides both dilution and transpiration cooling capabilities. The ALLSPD algorithm, which employs the preconditioning and eigenvalue rescaling techniques, is capable of providing efficient solution for flows with a wide range of Mach numbers. Although written for three-dimensional flows in general, the code can be used for two-dimensional and axisymmetric flow computations as well. The code is written in such a way that it can be run in various computer platforms (supercomputers, workstations and parallel processors) and the GUI (Graphical User Interface) should provide a user-friendly tool in setting up and running the code.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Binary video codec for data reduction in wireless visual sensor networks

NASA Astrophysics Data System (ADS)

Khursheed, Khursheed; Ahmad, Naeem; Imran, Muhammad; O'Nils, Mattias

2013-02-01

Wireless Visual Sensor Networks (WVSN) is formed by deploying many Visual Sensor Nodes (VSNs) in the field. Typical applications of WVSN include environmental monitoring, health care, industrial process monitoring, stadium/airports monitoring for security reasons and many more. The energy budget in the outdoor applications of WVSN is limited to the batteries and the frequent replacement of batteries is usually not desirable. So the processing as well as the communication energy consumption of the VSN needs to be optimized in such a way that the network remains functional for longer duration. The images captured by VSN contain huge amount of data and require efficient computational resources for processing the images and wide communication bandwidth for the transmission of the results. Image processing algorithms must be designed and developed in such a way that they are computationally less complex and must provide high compression rate. For some applications of WVSN, the captured images can be segmented into bi-level images and hence bi-level image coding methods will efficiently reduce the information amount in these segmented images. But the compression rate of the bi-level image coding methods is limited by the underlined compression algorithm. Hence there is a need for designing other intelligent and efficient algorithms which are computationally less complex and provide better compression rate than that of bi-level image coding methods. Change coding is one such algorithm which is computationally less complex (require only exclusive OR operations) and provide better compression efficiency compared to image coding but it is effective for applications having slight changes between adjacent frames of the video. The detection and coding of the Region of Interest (ROIs) in the change frame efficiently reduce the information amount in the change frame. But, if the number of objects in the change frames is higher than a certain level then the compression efficiency of both the change coding and ROI coding becomes worse than that of image coding. This paper explores the compression efficiency of the Binary Video Codec (BVC) for the data reduction in WVSN. We proposed to implement all the three compression techniques i.e. image coding, change coding and ROI coding at the VSN and then select the smallest bit stream among the results of the three compression techniques. In this way the compression performance of the BVC will never become worse than that of image coding. We concluded that the compression efficiency of BVC is always better than that of change coding and is always better than or equal that of ROI coding and image coding.
Computer Code Aids Design Of Wings

NASA Technical Reports Server (NTRS)

Carlson, Harry W.; Darden, Christine M.

1993-01-01

AERO2S computer code developed to aid design engineers in selection and evaluation of aerodynamically efficient wing/canard and wing/horizontal-tail configurations that includes simple hinged-flap systems. Code rapidly estimates longitudinal aerodynamic characteristics of conceptual airplane lifting-surface arrangements. Developed in FORTRAN V on CDC 6000 computer system, and ported to MS-DOS environment.
Complexity control algorithm based on adaptive mode selection for interframe coding in high efficiency video coding

NASA Astrophysics Data System (ADS)

Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong

2017-07-01

The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
Relative efficiency and accuracy of two Navier-Stokes codes for simulating attached transonic flow over wings

NASA Technical Reports Server (NTRS)

Bonhaus, Daryl L.; Wornom, Stephen F.

1991-01-01

Two codes which solve the 3-D Thin Layer Navier-Stokes (TLNS) equations are used to compute the steady state flow for two test cases representing typical finite wings at transonic conditions. Several grids of C-O topology and varying point densities are used to determine the effects of grid refinement. After a description of each code and test case, standards for determining code efficiency and accuracy are defined and applied to determine the relative performance of the two codes in predicting turbulent transonic wing flows. Comparisons of computed surface pressure distributions with experimental data are made.
Efficient computation of kinship and identity coefficients on large pedigrees.

PubMed

Cheng, En; Elliott, Brendan; Ozsoyoglu, Z Meral

2009-06-01

With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing inbreeding coefficient. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. Furthermore, we propose an improved scheme using Family NodeCodes for the computation of generalized kinship coefficients, which is motivated by the significant improvement of using Family NodeCodes for inbreeding coefficient over the use of NodeCodes. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
Computer-assisted coding and clinical documentation: first things first.

PubMed

Tully, Melinda; Carmichael, Angela

2012-10-01

Computer-assisted coding tools have the potential to drive improvements in seven areas: Transparency of coding. Productivity (generally by 20 to 25 percent for inpatient claims). Accuracy (by improving specificity of documentation). Cost containment (by reducing overtime expenses, audit fees, and denials). Compliance. Efficiency. Consistency.
Turbine Internal and Film Cooling Modeling For 3D Navier-Stokes Codes

NASA Technical Reports Server (NTRS)

DeWitt, Kenneth; Garg Vijay; Ameri, Ali

2005-01-01

The aim of this research project is to make use of NASA Glenn on-site computational facilities in order to develop, validate and apply aerodynamic, heat transfer, and turbine cooling models for use in advanced 3D Navier-Stokes Computational Fluid Dynamics (CFD) codes such as the Glenn-" code. Specific areas of effort include: Application of the Glenn-HT code to specific configurations made available under Turbine Based Combined Cycle (TBCC), and Ultra Efficient Engine Technology (UEET) projects. Validating the use of a multi-block code for the time accurate computation of the detailed flow and heat transfer of cooled turbine airfoils. The goal of the current research is to improve the predictive ability of the Glenn-HT code. This will enable one to design more efficient turbine components for both aviation and power generation. The models will be tested against specific configurations provided by NASA Glenn.
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Monte Carlo simulation of Ising models by multispin coding on a vector computer

NASA Astrophysics Data System (ADS)

Wansleben, Stephan; Zabolitzky, John G.; Kalle, Claus

1984-11-01

Rebbi's efficient multispin coding algorithm for Ising models is combined with the use of the vector computer CDC Cyber 205. A speed of 21.2 million updates per second is reached. This is comparable to that obtained by special- purpose computers.
Star adaptation for two-algorithms used on serial computers

NASA Technical Reports Server (NTRS)

Howser, L. M.; Lambiotte, J. J., Jr.

1974-01-01

Two representative algorithms used on a serial computer and presently executed on the Control Data Corporation 6000 computer were adapted to execute efficiently on the Control Data STAR-100 computer. Gaussian elimination for the solution of simultaneous linear equations and the Gauss-Legendre quadrature formula for the approximation of an integral are the two algorithms discussed. A description is given of how the programs were adapted for STAR and why these adaptations were necessary to obtain an efficient STAR program. Some points to consider when adapting an algorithm for STAR are discussed. Program listings of the 6000 version coded in 6000 FORTRAN, the adapted STAR version coded in 6000 FORTRAN, and the STAR version coded in STAR FORTRAN are presented in the appendices.
Classification of breast tissue in mammograms using efficient coding.

PubMed

Costa, Daniel D; Campos, Lúcio F; Barros, Allan K

2011-06-24

Female breast cancer is the major cause of death by cancer in western countries. Efforts in Computer Vision have been made in order to improve the diagnostic accuracy by radiologists. Some methods of lesion diagnosis in mammogram images were developed based in the technique of principal component analysis which has been used in efficient coding of signals and 2D Gabor wavelets used for computer vision applications and modeling biological vision. In this work, we present a methodology that uses efficient coding along with linear discriminant analysis to distinguish between mass and non-mass from 5090 region of interest from mammograms. The results show that the best rates of success reached with Gabor wavelets and principal component analysis were 85.28% and 87.28%, respectively. In comparison, the model of efficient coding presented here reached up to 90.07%. Altogether, the results presented demonstrate that independent component analysis performed successfully the efficient coding in order to discriminate mass from non-mass tissues. In addition, we have observed that LDA with ICA bases showed high predictive performance for some datasets and thus provide significant support for a more detailed clinical investigation.
Evaluation of the efficiency and fault density of software generated by code generators

NASA Technical Reports Server (NTRS)

Schreur, Barbara

1993-01-01

Flight computers and flight software are used for GN&C (guidance, navigation, and control), engine controllers, and avionics during missions. The software development requires the generation of a considerable amount of code. The engineers who generate the code make mistakes and the generation of a large body of code with high reliability requires considerable time. Computer-aided software engineering (CASE) tools are available which generates code automatically with inputs through graphical interfaces. These tools are referred to as code generators. In theory, code generators could write highly reliable code quickly and inexpensively. The various code generators offer different levels of reliability checking. Some check only the finished product while some allow checking of individual modules and combined sets of modules as well. Considering NASA's requirement for reliability, an in house manually generated code is needed. Furthermore, automatically generated code is reputed to be as efficient as the best manually generated code when executed. In house verification is warranted.
Development of numerical methods for overset grids with applications for the integrated Space Shuttle vehicle

NASA Technical Reports Server (NTRS)

Chan, William M.

1995-01-01

Algorithms and computer code developments were performed for the overset grid approach to solving computational fluid dynamics problems. The techniques developed are applicable to compressible Navier-Stokes flow for any general complex configurations. The computer codes developed were tested on different complex configurations with the Space Shuttle launch vehicle configuration as the primary test bed. General, efficient and user-friendly codes were produced for grid generation, flow solution and force and moment computation.
Efficient Proximity Computation Techniques Using ZIP Code Data for Smart Cities †

PubMed Central

Murdani, Muhammad Harist; Hong, Bonghee

2018-01-01

In this paper, we are interested in computing ZIP code proximity from two perspectives, proximity between two ZIP codes (Ad-Hoc) and neighborhood proximity (Top-K). Such a computation can be used for ZIP code-based target marketing as one of the smart city applications. A naïve approach to this computation is the usage of the distance between ZIP codes. We redefine a distance metric combining the centroid distance with the intersecting road network between ZIP codes by using a weighted sum method. Furthermore, we prove that the results of our combined approach conform to the characteristics of distance measurement. We have proposed a general and heuristic approach for computing Ad-Hoc proximity, while for computing Top-K proximity, we have proposed a general approach only. Our experimental results indicate that our approaches are verifiable and effective in reducing the execution time and search space. PMID:29587366
Efficient Proximity Computation Techniques Using ZIP Code Data for Smart Cities †.

PubMed

Murdani, Muhammad Harist; Kwon, Joonho; Choi, Yoon-Ho; Hong, Bonghee

2018-03-24

In this paper, we are interested in computing ZIP code proximity from two perspectives, proximity between two ZIP codes ( Ad-Hoc ) and neighborhood proximity ( Top-K ). Such a computation can be used for ZIP code-based target marketing as one of the smart city applications. A naïve approach to this computation is the usage of the distance between ZIP codes. We redefine a distance metric combining the centroid distance with the intersecting road network between ZIP codes by using a weighted sum method. Furthermore, we prove that the results of our combined approach conform to the characteristics of distance measurement. We have proposed a general and heuristic approach for computing Ad-Hoc proximity, while for computing Top-K proximity, we have proposed a general approach only. Our experimental results indicate that our approaches are verifiable and effective in reducing the execution time and search space.

Space shuttle main engine numerical modeling code modifications and analysis

NASA Technical Reports Server (NTRS)

Ziebarth, John P.

1988-01-01

The user of computational fluid dynamics (CFD) codes must be concerned with the accuracy and efficiency of the codes if they are to be used for timely design and analysis of complicated three-dimensional fluid flow configurations. A brief discussion of how accuracy and efficiency effect the CFD solution process is given. A more detailed discussion of how efficiency can be enhanced by using a few Cray Research Inc. utilities to address vectorization is presented and these utilities are applied to a three-dimensional Navier-Stokes CFD code (INS3D).
Performance measures for transform data coding.

NASA Technical Reports Server (NTRS)

Pearl, J.; Andrews, H. C.; Pratt, W. K.

1972-01-01

This paper develops performance criteria for evaluating transform data coding schemes under computational constraints. Computational constraints that conform with the proposed basis-restricted model give rise to suboptimal coding efficiency characterized by a rate-distortion relation R(D) similar in form to the theoretical rate-distortion function. Numerical examples of this performance measure are presented for Fourier, Walsh, Haar, and Karhunen-Loeve transforms.
Preliminary Results from the Application of Automated Adjoint Code Generation to CFL3D

NASA Technical Reports Server (NTRS)

Carle, Alan; Fagan, Mike; Green, Lawrence L.

1998-01-01

This report describes preliminary results obtained using an automated adjoint code generator for Fortran to augment a widely-used computational fluid dynamics flow solver to compute derivatives. These preliminary results with this augmented code suggest that, even in its infancy, the automated adjoint code generator can accurately and efficiently deliver derivatives for use in transonic Euler-based aerodynamic shape optimization problems with hundreds to thousands of independent design variables.
[Correlation of codon biases and potential secondary structures with mRNA translation efficiency in unicellular organisms].

PubMed

Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G

2007-01-01

Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.
New double-byte error-correcting codes for memory systems

NASA Technical Reports Server (NTRS)

Feng, Gui-Liang; Wu, Xinen; Rao, T. R. N.

1996-01-01

Error-correcting or error-detecting codes have been used in the computer industry to increase reliability, reduce service costs, and maintain data integrity. The single-byte error-correcting and double-byte error-detecting (SbEC-DbED) codes have been successfully used in computer memory subsystems. There are many methods to construct double-byte error-correcting (DBEC) codes. In the present paper we construct a class of double-byte error-correcting codes, which are more efficient than those known to be optimum, and a decoding procedure for our codes is also considered.
A Flexible and Non-instrusive Approach for Computing Complex Structural Coverage Metrics

NASA Technical Reports Server (NTRS)

Whalen, Michael W.; Person, Suzette J.; Rungta, Neha; Staats, Matt; Grijincu, Daniela

2015-01-01

Software analysis tools and techniques often leverage structural code coverage information to reason about the dynamic behavior of software. Existing techniques instrument the code with the required structural obligations and then monitor the execution of the compiled code to report coverage. Instrumentation based approaches often incur considerable runtime overhead for complex structural coverage metrics such as Modified Condition/Decision (MC/DC). Code instrumentation, in general, has to be approached with great care to ensure it does not modify the behavior of the original code. Furthermore, instrumented code cannot be used in conjunction with other analyses that reason about the structure and semantics of the code under test. In this work, we introduce a non-intrusive preprocessing approach for computing structural coverage information. It uses a static partial evaluation of the decisions in the source code and a source-to-bytecode mapping to generate the information necessary to efficiently track structural coverage metrics during execution. Our technique is flexible; the results of the preprocessing can be used by a variety of coverage-driven software analysis tasks, including automated analyses that are not possible for instrumented code. Experimental results in the context of symbolic execution show the efficiency and flexibility of our nonintrusive approach for computing code coverage information
Cross-indexing of binary SIFT codes for large-scale image search.

PubMed

Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi

2014-05-01

In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
A concatenated coding scheme for error control

NASA Technical Reports Server (NTRS)

Lin, S.

1985-01-01

A concatenated coding scheme for error contol in data communications was analyzed. The inner code is used for both error correction and detection, however the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughout efficiency of the proposed error control scheme incorporated with a selective repeat ARQ retransmission strategy is analyzed.
Parallel Computation of Unsteady Flows on a Network of Workstations

NASA Technical Reports Server (NTRS)

1997-01-01

Parallel computation of unsteady flows requires significant computational resources. The utilization of a network of workstations seems an efficient solution to the problem where large problems can be treated at a reasonable cost. This approach requires the solution of several problems: 1) the partitioning and distribution of the problem over a network of workstation, 2) efficient communication tools, 3) managing the system efficiently for a given problem. Of course, there is the question of the efficiency of any given numerical algorithm to such a computing system. NPARC code was chosen as a sample for the application. For the explicit version of the NPARC code both two- and three-dimensional problems were studied. Again both steady and unsteady problems were investigated. The issues studied as a part of the research program were: 1) how to distribute the data between the workstations, 2) how to compute and how to communicate at each node efficiently, 3) how to balance the load distribution. In the following, a summary of these activities is presented. Details of the work have been presented and published as referenced.
Nonuniform code concatenation for universal fault-tolerant quantum computing

NASA Astrophysics Data System (ADS)

Nikahd, Eesa; Sedighi, Mehdi; Saheb Zamani, Morteza

2017-09-01

Using transversal gates is a straightforward and efficient technique for fault-tolerant quantum computing. Since transversal gates alone cannot be computationally universal, they must be combined with other approaches such as magic state distillation, code switching, or code concatenation to achieve universality. In this paper we propose an alternative approach for universal fault-tolerant quantum computing, mainly based on the code concatenation approach proposed in [T. Jochym-O'Connor and R. Laflamme, Phys. Rev. Lett. 112, 010505 (2014), 10.1103/PhysRevLett.112.010505], but in a nonuniform fashion. The proposed approach is described based on nonuniform concatenation of the 7-qubit Steane code with the 15-qubit Reed-Muller code, as well as the 5-qubit code with the 15-qubit Reed-Muller code, which lead to two 49-qubit and 47-qubit codes, respectively. These codes can correct any arbitrary single physical error with the ability to perform a universal set of fault-tolerant gates, without using magic state distillation.
OpenFOAM: Open source CFD in research and industry

NASA Astrophysics Data System (ADS)

Jasak, Hrvoje

2009-12-01

The current focus of development in industrial Computational Fluid Dynamics (CFD) is integration of CFD into Computer-Aided product development, geometrical optimisation, robust design and similar. On the other hand, in CFD research aims to extend the boundaries ofpractical engineering use in "non-traditional " areas. Requirements of computational flexibility and code integration are contradictory: a change of coding paradigm, with object orientation, library components, equation mimicking is proposed as a way forward. This paper describes OpenFOAM, a C++ object oriented library for Computational Continuum Mechanics (CCM) developed by the author. Efficient and flexible implementation of complex physical models is achieved by mimicking the form ofpartial differential equation in software, with code functionality provided in library form. Open Source deployment and development model allows the user to achieve desired versatility in physical modeling without the sacrifice of complex geometry support and execution efficiency.
Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

NASA Astrophysics Data System (ADS)

Tolba, Khaled Ibrahim; Morgenthal, Guido

2018-01-01

This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
Application of a personal computer for the uncoupled vibration analysis of wind turbine blade and counterweight assemblies

NASA Technical Reports Server (NTRS)

White, P. R.; Little, R. R.

1985-01-01

A research effort was undertaken to develop personal computer based software for vibrational analysis. The software was developed to analytically determine the natural frequencies and mode shapes for the uncoupled lateral vibrations of the blade and counterweight assemblies used in a single bladed wind turbine. The uncoupled vibration analysis was performed in both the flapwise and chordwise directions for static rotor conditions. The effects of rotation on the uncoupled flapwise vibration of the blade and counterweight assemblies were evaluated for various rotor speeds up to 90 rpm. The theory, used in the vibration analysis codes, is based on a lumped mass formulation for the blade and counterweight assemblies. The codes are general so that other designs can be readily analyzed. The input for the codes is generally interactive to facilitate usage. The output of the codes is both tabular and graphical. Listings of the codes are provided. Predicted natural frequencies of the first several modes show reasonable agreement with experimental results. The analysis codes were originally developed on a DEC PDP 11/34 minicomputer and then downloaded and modified to run on an ITT XTRA personal computer. Studies conducted to evaluate the efficiency of running the programs on a personal computer as compared with the minicomputer indicated that, with the proper combination of hardware and software options, the efficiency of using a personal computer exceeds that of a minicomputer.
Speeding Up Ecological and Evolutionary Computations in R; Essentials of High Performance Computing for Biologists

PubMed Central

Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke

2015-01-01

Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842
Comparison of two- and three-dimensional flow computations with laser anemometer measurements in a transonic compressor rotor

NASA Technical Reports Server (NTRS)

Chima, R. V.; Strazisar, A. J.

1982-01-01

Two and three dimensional inviscid solutions for the flow in a transonic axial compressor rotor at design speed are compared with probe and laser anemometers measurements at near-stall and maximum-flow operating points. Experimental details of the laser anemometer system and computational details of the two dimensional axisymmetric code and three dimensional Euler code are described. Comparisons are made between relative Mach number and flow angle contours, shock location, and shock strength. A procedure for using an efficient axisymmetric code to generate downstream pressure input for computationally expensive Euler codes is discussed. A film supplement shows the calculations of the two operating points with the time-marching Euler code.
Getting Started in Classroom Computing.

ERIC Educational Resources Information Center

Ahl, David H.

Written for secondary students, this booklet provides an introduction to several computer-related concepts through a set of six classroom games, most of which can be played with little more than a sheet of paper and a pencil. The games are: 1) SECRET CODES--introduction to binary coding, punched cards, and paper tape; 2) GUESS--efficient methods…
The efficiency of geophysical adjoint codes generated by automatic differentiation tools

NASA Astrophysics Data System (ADS)

Vlasenko, A. V.; Köhl, A.; Stammer, D.

2016-02-01

The accuracy of numerical models that describe complex physical or chemical processes depends on the choice of model parameters. Estimating an optimal set of parameters by optimization algorithms requires knowledge of the sensitivity of the process of interest to model parameters. Typically the sensitivity computation involves differentiation of the model, which can be performed by applying algorithmic differentiation (AD) tools to the underlying numerical code. However, existing AD tools differ substantially in design, legibility and computational efficiency. In this study we show that, for geophysical data assimilation problems of varying complexity, the performance of adjoint codes generated by the existing AD tools (i) Open_AD, (ii) Tapenade, (iii) NAGWare and (iv) Transformation of Algorithms in Fortran (TAF) can be vastly different. Based on simple test problems, we evaluate the efficiency of each AD tool with respect to computational speed, accuracy of the adjoint, the efficiency of memory usage, and the capability of each AD tool to handle modern FORTRAN 90-95 elements such as structures and pointers, which are new elements that either combine groups of variables or provide aliases to memory addresses, respectively. We show that, while operator overloading tools are the only ones suitable for modern codes written in object-oriented programming languages, their computational efficiency lags behind source transformation by orders of magnitude, rendering the application of these modern tools to practical assimilation problems prohibitive. In contrast, the application of source transformation tools appears to be the most efficient choice, allowing handling even large geophysical data assimilation problems. However, they can only be applied to numerical models written in earlier generations of programming languages. Our study indicates that applying existing AD tools to realistic geophysical problems faces limitations that urgently need to be solved to allow the continuous use of AD tools for solving geophysical problems on modern computer architectures.
STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens.

PubMed

Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin

2017-06-01

We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.
Comparison of numerical techniques for integration of stiff ordinary differential equations arising in combustion chemistry

NASA Technical Reports Server (NTRS)

Radhakrishnan, K.

1984-01-01

The efficiency and accuracy of several algorithms recently developed for the efficient numerical integration of stiff ordinary differential equations are compared. The methods examined include two general-purpose codes, EPISODE and LSODE, and three codes (CHEMEQ, CREK1D, and GCKP84) developed specifically to integrate chemical kinetic rate equations. The codes are applied to two test problems drawn from combustion kinetics. The comparisons show that LSODE is the fastest code currently available for the integration of combustion kinetic rate equations. An important finding is that an interactive solution of the algebraic energy conservation equation to compute the temperature does not result in significant errors. In addition, this method is more efficient than evaluating the temperature by integrating its time derivative. Significant reductions in computational work are realized by updating the rate constants (k = at(supra N) N exp(-E/RT) only when the temperature change exceeds an amount delta T that is problem dependent. An approximate expression for the automatic evaluation of delta T is derived and is shown to result in increased efficiency.
Real-time computer treatment of THz passive device images with the high image quality

NASA Astrophysics Data System (ADS)

Trofimov, Vyacheslav A.; Trofimov, Vladislav V.

2012-06-01

We demonstrate real-time computer code improving significantly the quality of images captured by the passive THz imaging system. The code is not only designed for a THz passive device: it can be applied to any kind of such devices and active THz imaging systems as well. We applied our code for computer processing of images captured by four passive THz imaging devices manufactured by different companies. It should be stressed that computer processing of images produced by different companies requires using the different spatial filters usually. The performance of current version of the computer code is greater than one image per second for a THz image having more than 5000 pixels and 24 bit number representation. Processing of THz single image produces about 20 images simultaneously corresponding to various spatial filters. The computer code allows increasing the number of pixels for processed images without noticeable reduction of image quality. The performance of the computer code can be increased many times using parallel algorithms for processing the image. We develop original spatial filters which allow one to see objects with sizes less than 2 cm. The imagery is produced by passive THz imaging devices which captured the images of objects hidden under opaque clothes. For images with high noise we develop an approach which results in suppression of the noise after using the computer processing and we obtain the good quality image. With the aim of illustrating the efficiency of the developed approach we demonstrate the detection of the liquid explosive, ordinary explosive, knife, pistol, metal plate, CD, ceramics, chocolate and other objects hidden under opaque clothes. The results demonstrate the high efficiency of our approach for the detection of hidden objects and they are a very promising solution for the security problem.

Reactivity effects in VVER-1000 of the third unit of the kalinin nuclear power plant at physical start-up. Computations in ShIPR intellectual code system with library of two-group cross sections generated by UNK code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zizin, M. N.; Zimin, V. G.; Zizina, S. N., E-mail: zizin@adis.vver.kiae.ru

2010-12-15

The ShIPR intellectual code system for mathematical simulation of nuclear reactors includes a set of computing modules implementing the preparation of macro cross sections on the basis of the two-group library of neutron-physics cross sections obtained for the SKETCH-N nodal code. This library is created by using the UNK code for 3D diffusion computation of first VVER-1000 fuel loadings. Computation of neutron fields in the ShIPR system is performed using the DP3 code in the two-group diffusion approximation in 3D triangular geometry. The efficiency of all groups of control rods for the first fuel loading of the third unit ofmore » the Kalinin Nuclear Power Plant is computed. The temperature, barometric, and density effects of reactivity as well as the reactivity coefficient due to the concentration of boric acid in the reactor were computed additionally. Results of computations are compared with the experiment.« less
Reactivity effects in VVER-1000 of the third unit of the kalinin nuclear power plant at physical start-up. Computations in ShIPR intellectual code system with library of two-group cross sections generated by UNK code

NASA Astrophysics Data System (ADS)

Zizin, M. N.; Zimin, V. G.; Zizina, S. N.; Kryakvin, L. V.; Pitilimov, V. A.; Tereshonok, V. A.

2010-12-01

The ShIPR intellectual code system for mathematical simulation of nuclear reactors includes a set of computing modules implementing the preparation of macro cross sections on the basis of the two-group library of neutron-physics cross sections obtained for the SKETCH-N nodal code. This library is created by using the UNK code for 3D diffusion computation of first VVER-1000 fuel loadings. Computation of neutron fields in the ShIPR system is performed using the DP3 code in the two-group diffusion approximation in 3D triangular geometry. The efficiency of all groups of control rods for the first fuel loading of the third unit of the Kalinin Nuclear Power Plant is computed. The temperature, barometric, and density effects of reactivity as well as the reactivity coefficient due to the concentration of boric acid in the reactor were computed additionally. Results of computations are compared with the experiment.
Finite difference time domain electromagnetic scattering from frequency-dependent lossy materials

NASA Technical Reports Server (NTRS)

Luebbers, Raymond J.; Beggs, John H.

1991-01-01

Four different FDTD computer codes and companion Radar Cross Section (RCS) conversion codes on magnetic media are submitted. A single three dimensional dispersive FDTD code for both dispersive dielectric and magnetic materials was developed, along with a user's manual. The extension of FDTD to more complicated materials was made. The code is efficient and is capable of modeling interesting radar targets using a modest computer workstation platform. RCS results for two different plate geometries are reported. The FDTD method was also extended to computing far zone time domain results in two dimensions. Also the capability to model nonlinear materials was incorporated into FDTD and validated.
Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint.

PubMed

Gao, Zhi; Lao, Mingjie; Sang, Yongsheng; Wen, Fei; Ramesh, Bharath; Zhai, Ruifang

2018-05-06

Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency.
Pattern-based integer sample motion search strategies in the context of HEVC

NASA Astrophysics Data System (ADS)

Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas

2015-09-01

The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
Optimization of Particle-in-Cell Codes on RISC Processors

NASA Technical Reports Server (NTRS)

Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.

1996-01-01

General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Approximate maximum likelihood decoding of block codes

NASA Technical Reports Server (NTRS)

Greenberger, H. J.

1979-01-01

Approximate maximum likelihood decoding algorithms, based upon selecting a small set of candidate code words with the aid of the estimated probability of error of each received symbol, can give performance close to optimum with a reasonable amount of computation. By combining the best features of various algorithms and taking care to perform each step as efficiently as possible, a decoding scheme was developed which can decode codes which have better performance than those presently in use and yet not require an unreasonable amount of computation. The discussion of the details and tradeoffs of presently known efficient optimum and near optimum decoding algorithms leads, naturally, to the one which embodies the best features of all of them.
Turbofan Duct Propagation Model

NASA Technical Reports Server (NTRS)

Lan, Justin H.; Posey, Joe W. (Technical Monitor)

2001-01-01

The CDUCT code utilizes a parabolic approximation to the convected Helmholtz equation in order to efficiently model acoustic propagation in acoustically treated, complex shaped ducts. The parabolic approximation solves one-way wave propagation with a marching method which neglects backwards reflected waves. The derivation of the parabolic approximation is presented. Several code validation cases are given. An acoustic lining design process for an example aft fan duct is discussed. It is noted that the method can efficiently model realistic three-dimension effects, acoustic lining, and flow within the computational capabilities of a typical computer workstation.
Computation of Reacting Flows in Combustion Processes

NASA Technical Reports Server (NTRS)

Keith, Theo G., Jr.; Chen, K.-H.

2001-01-01

The objective of this research is to develop an efficient numerical algorithm with unstructured grids for the computation of three-dimensional chemical reacting flows that are known to occur in combustion components of propulsion systems. During the grant period (1996 to 1999), two companion codes have been developed and various numerical and physical models were implemented into the two codes.
Automatic finite element generators

NASA Technical Reports Server (NTRS)

Wang, P. S.

1984-01-01

The design and implementation of a software system for generating finite elements and related computations are described. Exact symbolic computational techniques are employed to derive strain-displacement matrices and element stiffness matrices. Methods for dealing with the excessive growth of symbolic expressions are discussed. Automatic FORTRAN code generation is described with emphasis on improving the efficiency of the resultant code.
Computational Predictions of the Performance Wright 'Bent End' Propellers

NASA Technical Reports Server (NTRS)

Wang, Xiang-Yu; Ash, Robert L.; Bobbitt, Percy J.; Prior, Edwin (Technical Monitor)

2002-01-01

Computational analysis of two 1911 Wright brothers 'Bent End' wooden propeller reproductions have been performed and compared with experimental test results from the Langley Full Scale Wind Tunnel. The purpose of the analysis was to check the consistency of the experimental results and to validate the reliability of the tests. This report is one part of the project on the propeller performance research of the Wright 'Bent End' propellers, intend to document the Wright brothers' pioneering propeller design contributions. Two computer codes were used in the computational predictions. The FLO-MG Navier-Stokes code is a CFD (Computational Fluid Dynamics) code based on the Navier-Stokes Equations. It is mainly used to compute the lift coefficient and the drag coefficient at specified angles of attack at different radii. Those calculated data are the intermediate results of the computation and a part of the necessary input for the Propeller Design Analysis Code (based on Adkins and Libeck method), which is a propeller design code used to compute the propeller thrust coefficient, the propeller power coefficient and the propeller propulsive efficiency.
Improving robustness and computational efficiency using modern C++

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paterno, M.; Kowalkowski, J.; Green, C.

2014-01-01

For nearly two decades, the C++ programming language has been the dominant programming language for experimental HEP. The publication of ISO/IEC 14882:2011, the current version of the international standard for the C++ programming language, makes available a variety of language and library facilities for improving the robustness, expressiveness, and computational efficiency of C++ code. However, much of the C++ written by the experimental HEP community does not take advantage of the features of the language to obtain these benefits, either due to lack of familiarity with these features or concern that these features must somehow be computationally inefficient. In thismore » paper, we address some of the features of modern C+-+, and show how they can be used to make programs that are both robust and computationally efficient. We compare and contrast simple yet realistic examples of some common implementation patterns in C, currently-typical C++, and modern C++, and show (when necessary, down to the level of generated assembly language code) the quality of the executable code produced by recent C++ compilers, with the aim of allowing the HEP community to make informed decisions on the costs and benefits of the use of modern C++.« less
Nonperturbative methods in HZE ion transport

NASA Technical Reports Server (NTRS)

Wilson, John W.; Badavi, Francis F.; Costen, Robert C.; Shinn, Judy L.

1993-01-01

A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport. The code is established to operate on the Langley Research Center nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code is highly efficient and compares well with the perturbation approximations.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Grote, D. P.

Forthon generates links between Fortran and Python. Python is a high level, object oriented, interactive and scripting language that allows a flexible and versatile interface to computational tools. The Forthon package generates the necessary wrapping code which allows access to the Fortran database and to the Fortran subroutines and functions. This provides a development package where the computationally intensive parts of a code can be written in efficient Fortran, and the high level controlling code can be written in the much more versatile Python language.
Users manual for updated computer code for axial-flow compressor conceptual design

NASA Technical Reports Server (NTRS)

Glassman, Arthur J.

1992-01-01

An existing computer code that determines the flow path for an axial-flow compressor either for a given number of stages or for a given overall pressure ratio was modified for use in air-breathing engine conceptual design studies. This code uses a rapid approximate design methodology that is based on isentropic simple radial equilibrium. Calculations are performed at constant-span-fraction locations from tip to hub. Energy addition per stage is controlled by specifying the maximum allowable values for several aerodynamic design parameters. New modeling was introduced to the code to overcome perceived limitations. Specific changes included variable rather than constant tip radius, flow path inclination added to the continuity equation, input of mass flow rate directly rather than indirectly as inlet axial velocity, solution for the exact value of overall pressure ratio rather than for any value that met or exceeded it, and internal computation of efficiency rather than the use of input values. The modified code was shown to be capable of computing efficiencies that are compatible with those of five multistage compressors and one fan that were tested experimentally. This report serves as a users manual for the revised code, Compressor Spanline Analysis (CSPAN). The modeling modifications, including two internal loss correlations, are presented. Program input and output are described. A sample case for a multistage compressor is included.
Parallelisation study of a three-dimensional environmental flow model

NASA Astrophysics Data System (ADS)

O'Donncha, Fearghal; Ragnoli, Emanuele; Suits, Frank

2014-03-01

There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.
Computer code for preliminary sizing analysis of axial-flow turbines

NASA Technical Reports Server (NTRS)

Glassman, Arthur J.

1992-01-01

This mean diameter flow analysis uses a stage average velocity diagram as the basis for the computational efficiency. Input design requirements include power or pressure ratio, flow rate, temperature, pressure, and rotative speed. Turbine designs are generated for any specified number of stages and for any of three types of velocity diagrams (symmetrical, zero exit swirl, or impulse) or for any specified stage swirl split. Exit turning vanes can be included in the design. The program output includes inlet and exit annulus dimensions, exit temperature and pressure, total and static efficiencies, flow angles, and last stage absolute and relative Mach numbers. An analysis is presented along with a description of the computer program input and output with sample cases. The analysis and code presented herein are modifications of those described in NASA-TN-D-6702. These modifications improve modeling rigor and extend code applicability.
Computer program for prediction of the deposition of material released from fixed and rotary wing aircraft

NASA Technical Reports Server (NTRS)

Teske, M. E.

1984-01-01

This is a user manual for the computer code ""AGDISP'' (AGricultural DISPersal) which has been developed to predict the deposition of material released from fixed and rotary wing aircraft in a single-pass, computationally efficient manner. The formulation of the code is novel in that the mean particle trajectory and the variance about the mean resulting from turbulent fluid fluctuations are simultaneously predicted. The code presently includes the capability of assessing the influence of neutral atmospheric conditions, inviscid wake vortices, particle evaporation, plant canopy and terrain on the deposition pattern.
Advances in Computational Capabilities for Hypersonic Flows

NASA Technical Reports Server (NTRS)

Kumar, Ajay; Gnoffo, Peter A.; Moss, James N.; Drummond, J. Philip

1997-01-01

The paper reviews the growth and advances in computational capabilities for hypersonic applications over the period from the mid-1980's to the present day. The current status of the code development issues such as surface and field grid generation, algorithms, physical and chemical modeling, and validation is provided. A brief description of some of the major codes being used at NASA Langley Research Center for hypersonic continuum and rarefied flows is provided, along with their capabilities and deficiencies. A number of application examples are presented, and future areas of research to enhance accuracy, reliability, efficiency, and robustness of computational codes are discussed.
A generic efficient adaptive grid scheme for rocket propulsion modeling

NASA Technical Reports Server (NTRS)

Mo, J. D.; Chow, Alan S.

1993-01-01

The objective of this research is to develop an efficient, time-accurate numerical algorithm to discretize the Navier-Stokes equations for the predictions of internal one-, two-dimensional and axisymmetric flows. A generic, efficient, elliptic adaptive grid generator is implicitly coupled with the Lower-Upper factorization scheme in the development of ALUNS computer code. The calculations of one-dimensional shock tube wave propagation and two-dimensional shock wave capture, wave-wave interactions, shock wave-boundary interactions show that the developed scheme is stable, accurate and extremely robust. The adaptive grid generator produced a very favorable grid network by a grid speed technique. This generic adaptive grid generator is also applied in the PARC and FDNS codes and the computational results for solid rocket nozzle flowfield and crystal growth modeling by those codes will be presented in the conference, too. This research work is being supported by NASA/MSFC.

Development of the 3DHZETRN code for space radiation protection

NASA Astrophysics Data System (ADS)

Wilson, John; Badavi, Francis; Slaba, Tony; Reddell, Brandon; Bahadori, Amir; Singleterry, Robert

Space radiation protection requires computationally efficient shield assessment methods that have been verified and validated. The HZETRN code is the engineering design code used for low Earth orbit dosimetric analysis and astronaut record keeping with end-to-end validation to twenty percent in Space Shuttle and International Space Station operations. HZETRN treated diffusive leakage only at the distal surface limiting its application to systems with a large radius of curvature. A revision of HZETRN that included forward and backward diffusion allowed neutron leakage to be evaluated at both the near and distal surfaces. That revision provided a deterministic code of high computational efficiency that was in substantial agreement with Monte Carlo (MC) codes in flat plates (at least to the degree that MC codes agree among themselves). In the present paper, the 3DHZETRN formalism capable of evaluation in general geometry is described. Benchmarking will help quantify uncertainty with MC codes (Geant4, FLUKA, MCNP6, and PHITS) in simple shapes such as spheres within spherical shells and boxes. Connection of the 3DHZETRN to general geometry will be discussed.
Efficient biprediction decision scheme for fast high efficiency video coding encoding

NASA Astrophysics Data System (ADS)

Park, Sang-hyo; Lee, Seung-ho; Jang, Euee S.; Jun, Dongsan; Kang, Jung-Won

2016-11-01

An efficient biprediction decision scheme of high efficiency video coding (HEVC) is proposed for fast-encoding applications. For low-delay video applications, bidirectional prediction can be used to increase compression performance efficiently with previous reference frames. However, at the same time, the computational complexity of the HEVC encoder is significantly increased due to the additional biprediction search. Although a some research has attempted to reduce this complexity, whether the prediction is strongly related to both motion complexity and prediction modes in a coding unit has not yet been investigated. A method that avoids most compression-inefficient search points is proposed so that the computational complexity of the motion estimation process can be dramatically decreased. To determine if biprediction is critical, the proposed method exploits the stochastic correlation of the context of prediction units (PUs): the direction of a PU and the accuracy of a motion vector. Through experimental results, the proposed method showed that the time complexity of biprediction can be reduced to 30% on average, outperforming existing methods in view of encoding time, number of function calls, and memory access.
Project : semi-autonomous parking for enhanced safety and efficiency.

DOT National Transportation Integrated Search

2016-04-01

Index coding, a coding formulation traditionally analyzed in the theoretical computer science and : information theory communities, has received considerable attention in recent years due to its value in : wireless communications and networking probl...
Lattice surgery on the Raussendorf lattice

NASA Astrophysics Data System (ADS)

Herr, Daniel; Paler, Alexandru; Devitt, Simon J.; Nori, Franco

2018-07-01

Lattice surgery is a method to perform quantum computation fault-tolerantly by using operations on boundary qubits between different patches of the planar code. This technique allows for universal planar code computation without eliminating the intrinsic two-dimensional nearest-neighbor properties of the surface code that eases physical hardware implementations. Lattice surgery approaches to algorithmic compilation and optimization have been demonstrated to be more resource efficient for resource-intensive components of a fault-tolerant algorithm, and consequently may be preferable over braid-based logic. Lattice surgery can be extended to the Raussendorf lattice, providing a measurement-based approach to the surface code. In this paper we describe how lattice surgery can be performed on the Raussendorf lattice and therefore give a viable alternative to computation using braiding in measurement-based implementations of topological codes.
Numerical Design of Megawatt Gyrotron with 120 GHz Frequency and 50% Efficiency for Plasma Fusion Application

NASA Astrophysics Data System (ADS)

Kumar, Nitin; Singh, Udaybir; Kumar, Anil; Bhattacharya, Ranajoy; Singh, T. P.; Sinha, A. K.

2013-02-01

The design of 120 GHz, 1 MW gyrotron for plasma fusion application is presented in this paper. The mode selection is carried out considering the aim of minimum mode competition, minimum cavity wall heating, etc. On the basis of the selected operating mode, the interaction cavity design and beam-wave interaction computation are carried out by using the PIC code. The design of triode type Magnetron Injection Gun (MIG) is also presented. Trajectory code EGUN, synthesis code MIGSYN and data analysis code MIGANS are used in the MIG designing. Further, the design of MIG is also validated by using the another trajectory code TRAK. The design results of beam dumping system (collector) and RF window are also presented. Depressed collector is designed to enhance the overall tube efficiency. The design study confirms >1 MW output power with tube efficiency around 50% (with collector efficiency).
Addressing the challenges of standalone multi-core simulations in molecular dynamics

NASA Astrophysics Data System (ADS)

Ocaya, R. O.; Terblans, J. J.

2017-07-01

Computational modelling in material science involves mathematical abstractions of force fields between particles with the aim to postulate, develop and understand materials by simulation. The aggregated pairwise interactions of the material's particles lead to a deduction of its macroscopic behaviours. For practically meaningful macroscopic scales, a large amount of data are generated, leading to vast execution times. Simulation times of hours, days or weeks for moderately sized problems are not uncommon. The reduction of simulation times, improved result accuracy and the associated software and hardware engineering challenges are the main motivations for many of the ongoing researches in the computational sciences. This contribution is concerned mainly with simulations that can be done on a "standalone" computer based on Message Passing Interfaces (MPI), parallel code running on hardware platforms with wide specifications, such as single/multi- processor, multi-core machines with minimal reconfiguration for upward scaling of computational power. The widely available, documented and standardized MPI library provides this functionality through the MPI_Comm_size (), MPI_Comm_rank () and MPI_Reduce () functions. A survey of the literature shows that relatively little is written with respect to the efficient extraction of the inherent computational power in a cluster. In this work, we discuss the main avenues available to tap into this extra power without compromising computational accuracy. We also present methods to overcome the high inertia encountered in single-node-based computational molecular dynamics. We begin by surveying the current state of the art and discuss what it takes to achieve parallelism, efficiency and enhanced computational accuracy through program threads and message passing interfaces. Several code illustrations are given. The pros and cons of writing raw code as opposed to using heuristic, third-party code are also discussed. The growing trend towards graphical processor units and virtual computing clouds for high-performance computing is also discussed. Finally, we present the comparative results of vacancy formation energy calculations using our own parallelized standalone code called Verlet-Stormer velocity (VSV) operating on 30,000 copper atoms. The code is based on the Sutton-Chen implementation of the Finnis-Sinclair pairwise embedded atom potential. A link to the code is also given.
Approximate Green's function methods for HZE transport in multilayered materials

NASA Technical Reports Server (NTRS)

Wilson, John W.; Badavi, Francis F.; Shinn, Judy L.; Costen, Robert C.

1993-01-01

A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport in multilayered materials. The code is established to operate on the Langley nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code was found to be highly efficient and compared well with the perturbation approximation.
CERES: An ab initio code dedicated to the calculation of the electronic structure and magnetic properties of lanthanide complexes.

PubMed

Calvello, Simone; Piccardo, Matteo; Rao, Shashank Vittal; Soncini, Alessandro

2018-03-05

We have developed and implemented a new ab initio code, Ceres (Computational Emulator of Rare Earth Systems), completely written in C++11, which is dedicated to the efficient calculation of the electronic structure and magnetic properties of the crystal field states arising from the splitting of the ground state spin-orbit multiplet in lanthanide complexes. The new code gains efficiency via an optimized implementation of a direct configurational averaged Hartree-Fock (CAHF) algorithm for the determination of 4f quasi-atomic active orbitals common to all multi-electron spin manifolds contributing to the ground spin-orbit multiplet of the lanthanide ion. The new CAHF implementation is based on quasi-Newton convergence acceleration techniques coupled to an efficient library for the direct evaluation of molecular integrals, and problem-specific density matrix guess strategies. After describing the main features of the new code, we compare its efficiency with the current state-of-the-art ab initio strategy to determine crystal field levels and properties, and show that our methodology, as implemented in Ceres, represents a more time-efficient computational strategy for the evaluation of the magnetic properties of lanthanide complexes, also allowing a full representation of non-perturbative spin-orbit coupling effects. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Time-dependent jet flow and noise computations

NASA Technical Reports Server (NTRS)

Berman, C. H.; Ramos, J. I.; Karniadakis, G. E.; Orszag, S. A.

1990-01-01

Methods for computing jet turbulence noise based on the time-dependent solution of Lighthill's (1952) differential equation are demonstrated. A key element in this approach is a flow code for solving the time-dependent Navier-Stokes equations at relatively high Reynolds numbers. Jet flow results at Re = 10,000 are presented here. This code combines a computationally efficient spectral element technique and a new self-consistent turbulence subgrid model to supply values for Lighthill's turbulence noise source tensor.
Transonic Navier-Stokes wing solutions using a zonal approach. Part 2: High angle-of-attack simulation

NASA Technical Reports Server (NTRS)

Chaderjian, N. M.

1986-01-01

A computer code is under development whereby the thin-layer Reynolds-averaged Navier-Stokes equations are to be applied to realistic fighter-aircraft configurations. This transonic Navier-Stokes code (TNS) utilizes a zonal approach in order to treat complex geometries and satisfy in-core computer memory constraints. The zonal approach has been applied to isolated wing geometries in order to facilitate code development. Part 1 of this paper addresses the TNS finite-difference algorithm, zonal methodology, and code validation with experimental data. Part 2 of this paper addresses some numerical issues such as code robustness, efficiency, and accuracy at high angles of attack. Special free-stream-preserving metrics proved an effective way to treat H-mesh singularities over a large range of severe flow conditions, including strong leading-edge flow gradients, massive shock-induced separation, and stall. Furthermore, lift and drag coefficients have been computed for a wing up through CLmax. Numerical oil flow patterns and particle trajectories are presented both for subcritical and transonic flow. These flow simulations are rich with complex separated flow physics and demonstrate the efficiency and robustness of the zonal approach.
LSENS, a general chemical kinetics and sensitivity analysis code for gas-phase reactions: User's guide

NASA Technical Reports Server (NTRS)

Radhakrishnan, Krishnan; Bittker, David A.

1993-01-01

A general chemical kinetics and sensitivity analysis code for complex, homogeneous, gas-phase reactions is described. The main features of the code, LSENS, are its flexibility, efficiency and convenience in treating many different chemical reaction models. The models include static system, steady, one-dimensional, inviscid flow, shock initiated reaction, and a perfectly stirred reactor. In addition, equilibrium computations can be performed for several assigned states. An implicit numerical integration method, which works efficiently for the extremes of very fast and very slow reaction, is used for solving the 'stiff' differential equation systems that arise in chemical kinetics. For static reactions, sensitivity coefficients of all dependent variables and their temporal derivatives with respect to the initial values of dependent variables and/or the rate coefficient parameters can be computed. This paper presents descriptions of the code and its usage, and includes several illustrative example problems.
Parallelization of ARC3D with Computer-Aided Tools

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.
BRYNTRN: A baryon transport model

NASA Technical Reports Server (NTRS)

Wilson, John W.; Townsend, Lawrence W.; Nealy, John E.; Chun, Sang Y.; Hong, B. S.; Buck, Warren W.; Lamkin, S. L.; Ganapol, Barry D.; Khan, Ferdous; Cucinotta, Francis A.

1989-01-01

The development of an interaction data base and a numerical solution to the transport of baryons through an arbitrary shield material based on a straight ahead approximation of the Boltzmann equation are described. The code is most accurate for continuous energy boundary values, but gives reasonable results for discrete spectra at the boundary using even a relatively coarse energy grid (30 points) and large spatial increments (1 cm in H2O). The resulting computer code is self-contained, efficient and ready to use. The code requires only a very small fraction of the computer resources required for Monte Carlo codes.
Applying graphics user interface ot group technology classification and coding at the Boeing aerospace company

NASA Astrophysics Data System (ADS)

Ness, P. H.; Jacobson, H.

1984-10-01

The thrust of 'group technology' is toward the exploitation of similarities in component design and manufacturing process plans to achieve assembly line flow cost efficiencies for small batch production. The systematic method devised for the identification of similarities in component geometry and processing steps is a coding and classification scheme implemented by interactive CAD/CAM systems. This coding and classification scheme has led to significant increases in computer processing power, allowing rapid searches and retrievals on the basis of a 30-digit code together with user-friendly computer graphics.
Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint

PubMed Central

Lao, Mingjie; Sang, Yongsheng; Wen, Fei; Zhai, Ruifang

2018-01-01

Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency. PMID:29734793
Computing NLTE Opacities -- Node Level Parallel Calculation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holladay, Daniel

Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.
Navier-Stokes calculations for DFVLR F5-wing in wind tunnel using Runge-Kutta time-stepping scheme

NASA Technical Reports Server (NTRS)

Vatsa, V. N.; Wedan, B. W.

1988-01-01

A three-dimensional Navier-Stokes code using an explicit multistage Runge-Kutta type of time-stepping scheme is used for solving the transonic flow past a finite wing mounted inside a wind tunnel. Flow past the same wing in free air was also computed to assess the effect of wind-tunnel walls on such flows. Numerical efficiency is enhanced through vectorization of the computer code. A Cyber 205 computer with 32 million words of internal memory was used for these computations.
Additional development of the XTRAN3S computer program

NASA Technical Reports Server (NTRS)

Borland, C. J.

1989-01-01

Additional developments and enhancements to the XTRAN3S computer program, a code for calculation of steady and unsteady aerodynamics, and associated aeroelastic solutions, for 3-D wings in the transonic flow regime are described. Algorithm improvements for the XTRAN3S program were provided including an implicit finite difference scheme to enhance the allowable time step and vectorization for improved computational efficiency. The code was modified to treat configurations with a fuselage, multiple stores/nacelles/pylons, and winglets. Computer program changes (updates) for error corrections and updates for version control are provided.
Analysis of electrophoresis performance

NASA Technical Reports Server (NTRS)

Roberts, Glyn O.

1988-01-01

A flexible efficient computer code is being developed to simulate electrophoretic separation phenomena, in either a cylindrical or a rectangular geometry. The code will computer the evolution in time of the concentrations of an arbitrary number of chemical species, and of the temperature, pH distribution, conductivity, electric field, and fluid motion. Use of nonuniform meshes and fast accurate implicit time-stepping will yield accurate answers at economical cost.
Particle Hydrodynamics with Material Strength for Multi-Layer Orbital Debris Shield Design

NASA Technical Reports Server (NTRS)

Fahrenthold, Eric P.

1999-01-01

Three dimensional simulation of oblique hypervelocity impact on orbital debris shielding places extreme demands on computer resources. Research to date has shown that particle models provide the most accurate and efficient means for computer simulation of shield design problems. In order to employ a particle based modeling approach to the wall plate impact portion of the shield design problem, it is essential that particle codes be augmented to represent strength effects. This report describes augmentation of a Lagrangian particle hydrodynamics code developed by the principal investigator, to include strength effects, allowing for the entire shield impact problem to be represented using a single computer code.

Implementation of Premixed Equilibrium Chemistry Capability in OVERFLOW

NASA Technical Reports Server (NTRS)

Olsen, M. E.; Liu, Y.; Vinokur, M.; Olsen, T.

2003-01-01

An implementation of premixed equilibrium chemistry has been completed for the OVERFLOW code, a chimera capable, complex geometry flow code widely used to predict transonic flowfields. The implementation builds on the computational efficiency and geometric generality of the solver.
Implementation of Premixed Equilibrium Chemistry Capability in OVERFLOW

NASA Technical Reports Server (NTRS)

Olsen, Mike E.; Liu, Yen; Vinokur, M.; Olsen, Tom

2004-01-01

An implementation of premixed equilibrium chemistry has been completed for the OVERFLOW code, a chimera capable, complex geometry flow code widely used to predict transonic flowfields. The implementation builds on the computational efficiency and geometric generality of the solver.
Recent applications of the transonic wing analysis computer code, TWING

NASA Technical Reports Server (NTRS)

Subramanian, N. R.; Holst, T. L.; Thomas, S. D.

1982-01-01

An evaluation of the transonic-wing-analysis computer code TWING is given. TWING utilizes a fully implicit approximate factorization iteration scheme to solve the full potential equation in conservative form. A numerical elliptic-solver grid-generation scheme is used to generate the required finite-difference mesh. Several wing configurations were analyzed, and the limits of applicability of this code was evaluated. Comparisons of computed results were made with available experimental data. Results indicate that the code is robust, accurate (when significant viscous effects are not present), and efficient. TWING generally produces solutions an order of magnitude faster than other conservative full potential codes using successive-line overrelaxation. The present method is applicable to a wide range of isolated wing configurations including high-aspect-ratio transport wings and low-aspect-ratio, high-sweep, fighter configurations.
Response surface method in geotechnical/structural analysis, phase 1

NASA Astrophysics Data System (ADS)

Wong, F. S.

1981-02-01

In the response surface approach, an approximating function is fit to a long running computer code based on a limited number of code calculations. The approximating function, called the response surface, is then used to replace the code in subsequent repetitive computations required in a statistical analysis. The procedure of the response surface development and feasibility of the method are shown using a sample problem in slop stability which is based on data from centrifuge experiments of model soil slopes and involves five random soil parameters. It is shown that a response surface can be constructed based on as few as four code calculations and that the response surface is computationally extremely efficient compared to the code calculation. Potential applications of this research include probabilistic analysis of dynamic, complex, nonlinear soil/structure systems such as slope stability, liquefaction, and nuclear reactor safety.
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu

2006-04-17

TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
Methods for Computationally Efficient Structured CFD Simulations of Complex Turbomachinery Flows

NASA Technical Reports Server (NTRS)

Herrick, Gregory P.; Chen, Jen-Ping

2012-01-01

This research presents more efficient computational methods by which to perform multi-block structured Computational Fluid Dynamics (CFD) simulations of turbomachinery, thus facilitating higher-fidelity solutions of complicated geometries and their associated flows. This computational framework offers flexibility in allocating resources to balance process count and wall-clock computation time, while facilitating research interests of simulating axial compressor stall inception with more complete gridding of the flow passages and rotor tip clearance regions than is typically practiced with structured codes. The paradigm presented herein facilitates CFD simulation of previously impractical geometries and flows. These methods are validated and demonstrate improved computational efficiency when applied to complicated geometries and flows.
A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters

NASA Technical Reports Server (NTRS)

Mackowski, D. W.; Mishchenko, M. I.

2011-01-01

A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.
An efficient method for computing unsteady transonic aerodynamics of swept wings with control surfaces

NASA Technical Reports Server (NTRS)

Liu, D. D.; Kao, Y. F.; Fung, K. Y.

1989-01-01

A transonic equivalent strip (TES) method was further developed for unsteady flow computations of arbitrary wing planforms. The TES method consists of two consecutive correction steps to a given nonlinear code such as LTRAN2; namely, the chordwise mean flow correction and the spanwise phase correction. The computation procedure requires direct pressure input from other computed or measured data. Otherwise, it does not require airfoil shape or grid generation for given planforms. To validate the computed results, four swept wings of various aspect ratios, including those with control surfaces, are selected as computational examples. Overall trends in unsteady pressures are established with those obtained by XTRAN3S codes, Isogai's full potential code and measured data by NLR and RAE. In comparison with these methods, the TES has achieved considerable saving in computer time and reasonable accuracy which suggests immediate industrial applications.
The TeraShake Computational Platform for Large-Scale Earthquake Simulations

NASA Astrophysics Data System (ADS)

Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
An efficient code for the simulation of nonhydrostatic stratified flow over obstacles

NASA Technical Reports Server (NTRS)

Pihos, G. G.; Wurtele, M. G.

1981-01-01

The physical model and computational procedure of the code is described in detail. The code is validated in tests against a variety of known analytical solutions from the literature and is also compared against actual mountain wave observations. The code will receive as initial input either mathematically idealized or discrete observational data. The form of the obstacle or mountain is arbitrary.
A concatenated coding scheme for error control

NASA Technical Reports Server (NTRS)

Kasami, T.; Fujiwara, T.; Lin, S.

1986-01-01

In this paper, a concatenated coding scheme for error control in data communications is presented and analyzed. In this scheme, the inner code is used for both error correction and detection; however, the outer code is used only for error detection. A retransmission is requested if either the inner code decoder fails to make a successful decoding or the outer code decoder detects the presence of errors after the inner code decoding. Probability of undetected error (or decoding error) of the proposed scheme is derived. An efficient method for computing this probability is presented. Throughput efficiency of the proposed error control scheme incorporated with a selective-repeat ARQ retransmission strategy is also analyzed. Three specific examples are presented. One of the examples is proposed for error control in the NASA Telecommand System.
THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

NASA Astrophysics Data System (ADS)

Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

2015-07-01

The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3

NASA Technical Reports Server (NTRS)

Lin, Shu

1998-01-01

Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
Scheduling Operations for Massive Heterogeneous Clusters

NASA Technical Reports Server (NTRS)

Humphrey, John; Spagnoli, Kyle

2013-01-01

High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
Efficient self-consistent viscous-inviscid solutions for unsteady transonic flow

NASA Technical Reports Server (NTRS)

Howlett, J. T.

1985-01-01

An improved method is presented for coupling a boundary layer code with an unsteady inviscid transonic computer code in a quasi-steady fashion. At each fixed time step, the boundary layer and inviscid equations are successively solved until the process converges. An explicit coupling of the equations is described which greatly accelerates the convergence process. Computer times for converged viscous-inviscid solutions are about 1.8 times the comparable inviscid values. Comparison of the results obtained with experimental data on three airfoils are presented. These comparisons demonstrate that the explicitly coupled viscous-inviscid solutions can provide efficient predictions of pressure distributions and lift for unsteady two-dimensional transonic flows.
Efficient self-consistent viscous-inviscid solutions for unsteady transonic flow

NASA Technical Reports Server (NTRS)

Howlett, J. T.

1985-01-01

An improved method is presented for coupling a boundary layer code with an unsteady inviscid transonic computer code in a quasi-steady fashion. At each fixed time step, the boundary layer and inviscid equations are successively solved until the process converges. An explicit coupling of the equations is described which greatly accelerates the convergence process. Computer times for converged viscous-inviscid solutions are about 1.8 times the comparable inviscid values. Comparison of the results obtained with experimental data on three airfoils are presented. These comparisons demonstrate that the explicitly coupled viscous-inviscid solutions can provide efficient predictions of pressure distributions and lift for unsteady two-dimensional transonic flow.
Accelerating execution of the integrated TIGER series Monte Carlo radiation transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, L.M.; Hochstedler, R.D.

1997-02-01

Execution of the integrated TIGER series (ITS) of coupled electron/photon Monte Carlo radiation transport codes has been accelerated by modifying the FORTRAN source code for more efficient computation. Each member code of ITS was benchmarked and profiled with a specific test case that directed the acceleration effort toward the most computationally intensive subroutines. Techniques for accelerating these subroutines included replacing linear search algorithms with binary versions, replacing the pseudo-random number generator, reducing program memory allocation, and proofing the input files for geometrical redundancies. All techniques produced identical or statistically similar results to the original code. Final benchmark timing of themore » accelerated code resulted in speed-up factors of 2.00 for TIGER (the one-dimensional slab geometry code), 1.74 for CYLTRAN (the two-dimensional cylindrical geometry code), and 1.90 for ACCEPT (the arbitrary three-dimensional geometry code).« less
Implementation of Finite Rate Chemistry Capability in OVERFLOW

NASA Technical Reports Server (NTRS)

Olsen, M. E.; Venkateswaran, S.; Prabhu, D. K.

2004-01-01

An implementation of both finite rate and equilibrium chemistry have been completed for the OVERFLOW code, a chimera capable, complex geometry flow code widely used to predict transonic flow fields. The implementation builds on the computational efficiency and geometric generality of the solver.
The feasibility of QR-code prescription in Taiwan.

PubMed

Lin, C-H; Tsai, F-Y; Tsai, W-L; Wen, H-W; Hu, M-L

2012-12-01

An ideal Health Care Service is a service system that focuses on patients. Patients in Taiwan have the freedom to fill their prescriptions at any pharmacies contracted with National Health Insurance. Each of these pharmacies uses its own computer system. So far, there are at least ten different systems on the market in Taiwan. To transmit the prescription information from the hospital to the pharmacy accurately and efficiently presents a great issue. This study consisted of two-dimensional applications using a QR-code to capture Patient's identification and prescription information from the hospitals as well as using a webcam to read the QR-code and transfer all data to the pharmacy computer system. Two hospitals and 85 community pharmacies participated in the study. During the trial, all participant pharmacies appraised highly of the accurate transmission of the prescription information. The contents in QR-code prescriptions from Taipei area were picked up efficiently and accurately in pharmacies at Taichung area (middle Taiwan) without software system limit and area limitation. The QR-code device received a patent (No. M376844, March 2010) from Intellectual Property Office Ministry of Economic Affair, China. Our trial has proven that QR-code prescription can provide community pharmacists an efficient, accurate and inexpensive device to digitalize the prescription contents. Consequently, pharmacists can offer better quality of pharmacy service to patients. © 2012 Blackwell Publishing Ltd.

Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
Prediction of sound radiated from different practical jet engine inlets

NASA Technical Reports Server (NTRS)

Zinn, B. T.; Meyer, W. L.

1980-01-01

Existing computer codes for calculating the far field radiation patterns surrounding various practical jet engine inlet configurations under different excitation conditions were upgraded. The computer codes were refined and expanded so that they are now more efficient computationally by a factor of about three and they are now capable of producing accurate results up to nondimensional wave numbers of twenty. Computer programs were also developed to help generate accurate geometrical representations of the inlets to be investigated. This data is required as input for the computer programs which calculate the sound fields. This new geometry generating computer program considerably reduces the time required to generate the input data which was one of the most time consuming steps in the process. The results of sample runs using the NASA-Lewis QCSEE inlet are presented and comparison of run times and accuracy are made between the old and upgraded computer codes. The overall accuracy of the computations is determined by comparison of the results of the computations with simple source solutions.
SAGE: The Self-Adaptive Grid Code. 3

NASA Technical Reports Server (NTRS)

Davies, Carol B.; Venkatapathy, Ethiraj

1999-01-01

The multi-dimensional self-adaptive grid code, SAGE, is an important tool in the field of computational fluid dynamics (CFD). It provides an efficient method to improve the accuracy of flow solutions while simultaneously reducing computer processing time. Briefly, SAGE enhances an initial computational grid by redistributing the mesh points into more appropriate locations. The movement of these points is driven by an equal-error-distribution algorithm that utilizes the relationship between high flow gradients and excessive solution errors. The method also provides a balance between clustering points in the high gradient regions and maintaining the smoothness and continuity of the adapted grid, The latest version, Version 3, includes the ability to change the boundaries of a given grid to more efficiently enclose flow structures and provides alternative redistribution algorithms.
A Computer Code for Swirling Turbulent Axisymmetric Recirculating Flows in Practical Isothermal Combustor Geometries

NASA Technical Reports Server (NTRS)

Lilley, D. G.; Rhode, D. L.

1982-01-01

A primitive pressure-velocity variable finite difference computer code was developed to predict swirling recirculating inert turbulent flows in axisymmetric combustors in general, and for application to a specific idealized combustion chamber with sudden or gradual expansion. The technique involves a staggered grid system for axial and radial velocities, a line relaxation procedure for efficient solution of the equations, a two-equation k-epsilon turbulence model, a stairstep boundary representation of the expansion flow, and realistic accommodation of swirl effects. A user's manual, dealing with the computational problem, showing how the mathematical basis and computational scheme may be translated into a computer program is presented. A flow chart, FORTRAN IV listing, notes about various subroutines and a user's guide are supplied as an aid to prospective users of the code.
Parametric bicubic spline and CAD tools for complex targets shape modelling in physical optics radar cross section prediction

NASA Astrophysics Data System (ADS)

Delogu, A.; Furini, F.

1991-09-01

Increasing interest in radar cross section (RCS) reduction is placing new demands on theoretical, computation, and graphic techniques for calculating scattering properties of complex targets. In particular, computer codes capable of predicting the RCS of an entire aircraft at high frequency and of achieving RCS control with modest structural changes, are becoming of paramount importance in stealth design. A computer code, evaluating the RCS of arbitrary shaped metallic objects that are computer aided design (CAD) generated, and its validation with measurements carried out using ALENIA RCS test facilities are presented. The code, based on the physical optics method, is characterized by an efficient integration algorithm with error control, in order to contain the computer time within acceptable limits, and by an accurate parametric representation of the target surface in terms of bicubic splines.
Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code

NASA Technical Reports Server (NTRS)

Spiegel, Seth C.; DeBonis, James R.; Huynh, H. T.

2016-01-01

A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
MINIVER: Miniature version of real/ideal gas aero-heating and ablation computer program

NASA Technical Reports Server (NTRS)

Hendler, D. R.

1976-01-01

Computer code is used to determine heat transfer multiplication factors, special flow field simulation techniques, different heat transfer methods, different transition criteria, crossflow simulation, and more efficient thin skin thickness optimization procedure.
An efficient HZETRN (a galactic cosmic ray transport code)

NASA Technical Reports Server (NTRS)

Shinn, Judy L.; Wilson, John W.

1992-01-01

An accurate and efficient engineering code for analyzing the shielding requirements against the high-energy galactic heavy ions is needed. The HZETRN is a deterministic code developed at Langley Research Center that is constantly under improvement both in physics and numerical computation and is targeted for such use. One problem area connected with the space-marching technique used in this code is the propagation of the local truncation error. By improving the numerical algorithms for interpolation, integration, and grid distribution formula, the efficiency of the code is increased by a factor of eight as the number of energy grid points is reduced. The numerical accuracy of better than 2 percent for a shield thickness of 150 g/cm(exp 2) is found when a 45 point energy grid is used. The propagating step size, which is related to the perturbation theory, is also reevaluated.
Final Technical Report for SBIR entitled Four-Dimensional Finite-Orbit-Width Fokker-Planck Code with Sources, for Neoclassical/Anomalous Transport Simulation of Ion and Electron Distributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harvey, R. W.; Petrov, Yu. V.

2013-12-03

Within the US Department of Energy/Office of Fusion Energy magnetic fusion research program, there is an important whole-plasma-modeling need for a radio-frequency/neutral-beam-injection (RF/NBI) transport-oriented finite-difference Fokker-Planck (FP) code with combined capabilities for 4D (2R2V) geometry near the fusion plasma periphery, and computationally less demanding 3D (1R2V) bounce-averaged capabilities for plasma in the core of fusion devices. Demonstration of proof-of-principle achievement of this goal has been carried out in research carried out under Phase I of the SBIR award. Two DOE-sponsored codes, the CQL3D bounce-average Fokker-Planck code in which CompX has specialized, and the COGENT 4D, plasma edge-oriented Fokker-Planck code whichmore » has been constructed by Lawrence Livermore National Laboratory and Lawrence Berkeley Laboratory scientists, where coupled. Coupling was achieved by using CQL3D calculated velocity distributions including an energetic tail resulting from NBI, as boundary conditions for the COGENT code over the two-dimensional velocity space on a spatial interface (flux) surface at a given radius near the plasma periphery. The finite-orbit-width fast ions from the CQL3D distributions penetrated into the peripheral plasma modeled by the COGENT code. This combined code demonstrates the feasibility of the proposed 3D/4D code. By combining these codes, the greatest computational efficiency is achieved subject to present modeling needs in toroidally symmetric magnetic fusion devices. The more efficient 3D code can be used in its regions of applicability, coupled to the more computationally demanding 4D code in higher collisionality edge plasma regions where that extended capability is necessary for accurate representation of the plasma. More efficient code leads to greater use and utility of the model. An ancillary aim of the project is to make the combined 3D/4D code user friendly. Achievement of full-coupling of these two Fokker-Planck codes will advance computational modeling of plasma devices important to the USDOE magnetic fusion energy program, in particular the DIII-D tokamak at General Atomics, San Diego, the NSTX spherical tokamak at Princeton, New Jersey, and the MST reversed-field-pinch Madison, Wisconsin. The validation studies of the code against the experiments will improve understanding of physics important for magnetic fusion, and will increase our design capabilities for achieving the goals of the International Tokamak Experimental Reactor (ITER) project in which the US is a participant and which seeks to demonstrate at least a factor of five in fusion power production divided by input power.« less
Ice Accretion Calculations for a Commercial Transport Using the LEWICE3D, ICEGRID3D and CMARC Programs

NASA Technical Reports Server (NTRS)

Bidwell, Colin S.; Pinella, David; Garrison, Peter

1999-01-01

Collection efficiency and ice accretion calculations were made for a commercial transport using the NASA Lewis LEWICE3D ice accretion code, the ICEGRID3D grid code and the CMARC panel code. All of the calculations were made on a Windows 95 based personal computer. The ice accretion calculations were made for the nose, wing, horizontal tail and vertical tail surfaces. Ice shapes typifying those of a 30 minute hold were generated. Collection efficiencies were also generated for the entire aircraft using the newly developed unstructured collection efficiency method. The calculations highlight the flexibility and cost effectiveness of the LEWICE3D, ICEGRID3D, CMARC combination.
A GPU-accelerated implicit meshless method for compressible flows

NASA Astrophysics Data System (ADS)

Zhang, Jia-Le; Ma, Zhi-Hua; Chen, Hong-Quan; Cao, Cheng

2018-05-01

This paper develops a recently proposed GPU based two-dimensional explicit meshless method (Ma et al., 2014) by devising and implementing an efficient parallel LU-SGS implicit algorithm to further improve the computational efficiency. The capability of the original 2D meshless code is extended to deal with 3D complex compressible flow problems. To resolve the inherent data dependency of the standard LU-SGS method, which causes thread-racing conditions destabilizing numerical computation, a generic rainbow coloring method is presented and applied to organize the computational points into different groups by painting neighboring points with different colors. The original LU-SGS method is modified and parallelized accordingly to perform calculations in a color-by-color manner. The CUDA Fortran programming model is employed to develop the key kernel functions to apply boundary conditions, calculate time steps, evaluate residuals as well as advance and update the solution in the temporal space. A series of two- and three-dimensional test cases including compressible flows over single- and multi-element airfoils and a M6 wing are carried out to verify the developed code. The obtained solutions agree well with experimental data and other computational results reported in the literature. Detailed analysis on the performance of the developed code reveals that the developed CPU based implicit meshless method is at least four to eight times faster than its explicit counterpart. The computational efficiency of the implicit method could be further improved by ten to fifteen times on the GPU.
Tally and geometry definition influence on the computing time in radiotherapy treatment planning with MCNP Monte Carlo code.

PubMed

Juste, B; Miro, R; Gallardo, S; Santos, A; Verdu, G

2006-01-01

The present work has simulated the photon and electron transport in a Theratron 780 (MDS Nordion) (60)Co radiotherapy unit, using the Monte Carlo transport code, MCNP (Monte Carlo N-Particle), version 5. In order to become computationally more efficient in view of taking part in the practical field of radiotherapy treatment planning, this work is focused mainly on the analysis of dose results and on the required computing time of different tallies applied in the model to speed up calculations.
AREVA Developments for an Efficient and Reliable use of Monte Carlo codes for Radiation Transport Applications

NASA Astrophysics Data System (ADS)

Chapoutier, Nicolas; Mollier, François; Nolin, Guillaume; Culioli, Matthieu; Mace, Jean-Reynald

2017-09-01

In the context of the rising of Monte Carlo transport calculations for any kind of application, AREVA recently improved its suite of engineering tools in order to produce efficient Monte Carlo workflow. Monte Carlo codes, such as MCNP or TRIPOLI, are recognized as reference codes to deal with a large range of radiation transport problems. However the inherent drawbacks of theses codes - laboring input file creation and long computation time - contrast with the maturity of the treatment of the physical phenomena. The goals of the recent AREVA developments were to reach similar efficiency as other mature engineering sciences such as finite elements analyses (e.g. structural or fluid dynamics). Among the main objectives, the creation of a graphical user interface offering CAD tools for geometry creation and other graphical features dedicated to the radiation field (source definition, tally definition) has been reached. The computations times are drastically reduced compared to few years ago thanks to the use of massive parallel runs, and above all, the implementation of hybrid variance reduction technics. From now engineering teams are capable to deliver much more prompt support to any nuclear projects dealing with reactors or fuel cycle facilities from conceptual phase to decommissioning.
Solar Proton Transport Within an ICRU Sphere Surrounded by a Complex Shield: Ray-trace Geometry

NASA Technical Reports Server (NTRS)

Slaba, Tony C.; Wilson, John W.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.

2015-01-01

A computationally efficient 3DHZETRN code with enhanced neutron and light ion (Z is less than or equal to 2) propagation was recently developed for complex, inhomogeneous shield geometry described by combinatorial objects. Comparisons were made between 3DHZETRN results and Monte Carlo (MC) simulations at locations within the combinatorial geometry, and it was shown that 3DHZETRN agrees with the MC codes to the extent they agree with each other. In the present report, the 3DHZETRN code is extended to enable analysis in ray-trace geometry. This latest extension enables the code to be used within current engineering design practices utilizing fully detailed vehicle and habitat geometries. Through convergence testing, it is shown that fidelity in an actual shield geometry can be maintained in the discrete ray-trace description by systematically increasing the number of discrete rays used. It is also shown that this fidelity is carried into transport procedures and resulting exposure quantities without sacrificing computational efficiency.
Solar proton exposure of an ICRU sphere within a complex structure part II: Ray-trace geometry.

PubMed

Slaba, Tony C; Wilson, John W; Badavi, Francis F; Reddell, Brandon D; Bahadori, Amir A

2016-06-01

A computationally efficient 3DHZETRN code with enhanced neutron and light ion (Z ≤ 2) propagation was recently developed for complex, inhomogeneous shield geometry described by combinatorial objects. Comparisons were made between 3DHZETRN results and Monte Carlo (MC) simulations at locations within the combinatorial geometry, and it was shown that 3DHZETRN agrees with the MC codes to the extent they agree with each other. In the present report, the 3DHZETRN code is extended to enable analysis in ray-trace geometry. This latest extension enables the code to be used within current engineering design practices utilizing fully detailed vehicle and habitat geometries. Through convergence testing, it is shown that fidelity in an actual shield geometry can be maintained in the discrete ray-trace description by systematically increasing the number of discrete rays used. It is also shown that this fidelity is carried into transport procedures and resulting exposure quantities without sacrificing computational efficiency. Published by Elsevier Ltd.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Efficient computation of photonic crystal waveguide modes with dispersive material.

PubMed

Schmidt, Kersten; Kappeler, Roman

2010-03-29

The optimization of PhC waveguides is a key issue for successfully designing PhC devices. Since this design task is computationally expensive, efficient methods are demanded. The available codes for computing photonic bands are also applied to PhC waveguides. They are reliable but not very efficient, which is even more pronounced for dispersive material. We present a method based on higher order finite elements with curved cells, which allows to solve for the band structure taking directly into account the dispersiveness of the materials. This is accomplished by reformulating the wave equations as a linear eigenproblem in the complex wave-vectors k. For this method, we demonstrate the high efficiency for the computation of guided PhC waveguide modes by a convergence analysis.
Development of the Tensoral Computer Language

NASA Technical Reports Server (NTRS)

Ferziger, Joel; Dresselhaus, Eliot

1996-01-01

The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.
Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Model Code

NASA Technical Reports Server (NTRS)

Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos

1996-01-01

An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.

Analysis of PANDA Passive Containment Cooling Steady-State Tests with the Spectra Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stempniewicz, Marek M

2000-07-15

Results of post test simulation of the PANDA passive containment cooling (PCC) steady-state tests (S-series tests), performed at the PANDA facility at the Paul Scherrer Institute, Switzerland, are presented. The simulation has been performed using the computer code SPECTRA, a thermal-hydraulic code, designed specifically for analyzing containment behavior of nuclear power plants.Results of the present calculations are compared to the measurement data as well as the results obtained earlier with the codes MELCOR, TRAC-BF1, and TRACG. The calculated PCC efficiencies are somewhat lower than the measured values. Similar underestimation of PCC efficiencies had been obtained in the past, with themore » other computer codes. To explain this difference, it is postulated that condensate coming into the tubes forms a stream of liquid in one or two tubes, leaving most of the tubes unaffected. The condensate entering the water box is assumed to fall down in the form of droplets. With these assumptions, the results calculated with SPECTRA are close to the experimental data.It is concluded that the SPECTRA code is a suitable tool for analyzing containments of advanced reactors, equipped with passive containment cooling systems.« less
Step-by-step magic state encoding for efficient fault-tolerant quantum computation

PubMed Central

Goto, Hayato

2014-01-01

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387
Step-by-step magic state encoding for efficient fault-tolerant quantum computation.

PubMed

Goto, Hayato

2014-12-16

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.
Coding efficiency of AVS 2.0 for CBAC and CABAC engines

NASA Astrophysics Data System (ADS)

Cui, Jing; Choi, Youngkyu; Chae, Soo-Ik

2015-12-01

In this paper we compare the coding efficiency of AVS 2.0[1] for engines of the Context-based Binary Arithmetic Coding (CBAC)[2] in the AVS 2.0 and the Context-Adaptive Binary Arithmetic Coder (CABAC)[3] in the HEVC[4]. For fair comparison, the CABAC is embedded in the reference code RD10.1 because the CBAC is in the HEVC in our previous work[5]. The rate estimation table is employed only for RDOQ in the RD code. To reduce the computation complexity of the video encoder, therefore we modified the RD code so that the rate estimation table is employed for all RDO decision. Furthermore, we also simplify the complexity of rate estimation table by reducing the bit depth of its fractional part to 2 from 8. The simulation result shows that the CABAC has the BD-rate loss of about 0.7% compared to the CBAC. It seems that the CBAC is a little more efficient than that the CABAC in the AVS 2.0.
SIMD Optimization of Linear Expressions for Programmable Graphics Hardware

PubMed Central

Bajaj, Chandrajit; Ihm, Insung; Min, Jungki; Oh, Jinsang

2009-01-01

The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of ȳ = Ax̄ + b̄, where A is a matrix, and x̄, ȳ and b̄ are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods. PMID:19946569
Development of Reduced-Order Models for Aeroelastic and Flutter Prediction Using the CFL3Dv6.0 Code

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Bartels, Robert E.

2002-01-01

A reduced-order model (ROM) is developed for aeroelastic analysis using the CFL3D version 6.0 computational fluid dynamics (CFD) code, recently developed at the NASA Langley Research Center. This latest version of the flow solver includes a deforming mesh capability, a modal structural definition for nonlinear aeroelastic analyses, and a parallelization capability that provides a significant increase in computational efficiency. Flutter results for the AGARD 445.6 Wing computed using CFL3D v6.0 are presented, including discussion of associated computational costs. Modal impulse responses of the unsteady aerodynamic system are then computed using the CFL3Dv6 code and transformed into state-space form. Important numerical issues associated with the computation of the impulse responses are presented. The unsteady aerodynamic state-space ROM is then combined with a state-space model of the structure to create an aeroelastic simulation using the MATLAB/SIMULINK environment. The MATLAB/SIMULINK ROM is used to rapidly compute aeroelastic transients including flutter. The ROM shows excellent agreement with the aeroelastic analyses computed using the CFL3Dv6.0 code directly.
Enhancement of the CAVE computer code

NASA Astrophysics Data System (ADS)

Rathjen, K. A.; Burk, H. O.

1983-12-01

The computer code CAVE (Conduction Analysis via Eigenvalues) is a convenient and efficient computer code for predicting two dimensional temperature histories within thermal protection systems for hypersonic vehicles. The capabilities of CAVE were enhanced by incorporation of the following features into the code: real gas effects in the aerodynamic heating predictions, geometry and aerodynamic heating package for analyses of cone shaped bodies, input option to change from laminar to turbulent heating predictions on leading edges, modification to account for reduction in adiabatic wall temperature with increase in leading sweep, geometry package for two dimensional scramjet engine sidewall, with an option for heat transfer to external and internal surfaces, print out modification to provide tables of select temperatures for plotting and storage, and modifications to the radiation calculation procedure to eliminate temperature oscillations induced by high heating rates. These new features are described.
Capacity Maximizing Constellations

NASA Technical Reports Server (NTRS)

Barsoum, Maged; Jones, Christopher

2010-01-01

Some non-traditional signal constellations have been proposed for transmission of data over the Additive White Gaussian Noise (AWGN) channel using such channel-capacity-approaching codes as low-density parity-check (LDPC) or turbo codes. Computational simulations have shown performance gains of more than 1 dB over traditional constellations. These gains could be translated to bandwidth- efficient communications, variously, over longer distances, using less power, or using smaller antennas. The proposed constellations have been used in a bit-interleaved coded modulation system employing state-ofthe-art LDPC codes. In computational simulations, these constellations were shown to afford performance gains over traditional constellations as predicted by the gap between the parallel decoding capacity of the constellations and the Gaussian capacity
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

PubMed Central

Kim, Deokho; Park, Karam; Ro, Won W.

2011-01-01

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Design and optimization of a portable LQCD Monte Carlo code using OpenACC

NASA Astrophysics Data System (ADS)

Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
CFD code evaluation for internal flow modeling

NASA Technical Reports Server (NTRS)

Chung, T. J.

1990-01-01

Research on the computational fluid dynamics (CFD) code evaluation with emphasis on supercomputing in reacting flows is discussed. Advantages of unstructured grids, multigrids, adaptive methods, improved flow solvers, vector processing, parallel processing, and reduction of memory requirements are discussed. As examples, researchers include applications of supercomputing to reacting flow Navier-Stokes equations including shock waves and turbulence and combustion instability problems associated with solid and liquid propellants. Evaluation of codes developed by other organizations are not included. Instead, the basic criteria for accuracy and efficiency have been established, and some applications on rocket combustion have been made. Research toward an ultimate goal, the most accurate and efficient CFD code, is in progress and will continue for years to come.
Application of CHAD hydrodynamics to shock-wave problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trease, H.E.; O`Rourke, P.J.; Sahota, M.S.

1997-12-31

CHAD is the latest in a sequence of continually evolving computer codes written to effectively utilize massively parallel computer architectures and the latest grid generators for unstructured meshes. Its applications range from automotive design issues such as in-cylinder and manifold flows of internal combustion engines, vehicle aerodynamics, underhood cooling and passenger compartment heating, ventilation, and air conditioning to shock hydrodynamics and materials modeling. CHAD solves the full unsteady Navier-Stoke equations with the k-epsilon turbulence model in three space dimensions. The code has four major features that distinguish it from the earlier KIVA code, also developed at Los Alamos. First, itmore » is based on a node-centered, finite-volume method in which, like finite element methods, all fluid variables are located at computational nodes. The computational mesh efficiently and accurately handles all element shapes ranging from tetrahedra to hexahedra. Second, it is written in standard Fortran 90 and relies on automatic domain decomposition and a universal communication library written in standard C and MPI for unstructured grids to effectively exploit distributed-memory parallel architectures. Thus the code is fully portable to a variety of computing platforms such as uniprocessor workstations, symmetric multiprocessors, clusters of workstations, and massively parallel platforms. Third, CHAD utilizes a variable explicit/implicit upwind method for convection that improves computational efficiency in flows that have large velocity Courant number variations due to velocity of mesh size variations. Fourth, CHAD is designed to also simulate shock hydrodynamics involving multimaterial anisotropic behavior under high shear. The authors will discuss CHAD capabilities and show several sample calculations showing the strengths and weaknesses of CHAD.« less
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE PAGES

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...

2016-01-06

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Aerothermal modeling program, phase 2. Element B: Flow interaction experiment

NASA Technical Reports Server (NTRS)

Nikjooy, M.; Mongia, H. C.; Murthy, S. N. B.; Sullivan, J. P.

1986-01-01

The design process was improved and the efficiency, life, and maintenance costs of the turbine engine hot section was enhanced. Recently, there has been much emphasis on the need for improved numerical codes for the design of efficient combustors. For the development of improved computational codes, there is a need for an experimentally obtained data base to be used at test cases for the accuracy of the computations. The purpose of Element-B is to establish a benchmark quality velocity and scalar measurements of the flow interaction of circular jets with swirling flow typical of that in the dome region of annular combustor. In addition to the detailed experimental effort, extensive computations of the swirling flows are to be compared with the measurements for the purpose of assessing the accuracy of current and advanced turbulence and scalar transport models.
Study of shock-induced combustion using an implicit TVD scheme

NASA Technical Reports Server (NTRS)

Yungster, Shayne

1992-01-01

The supersonic combustion flowfields associated with various hypersonic propulsion systems, such as the ram accelerator, the oblique detonation wave engine, and the scramjet, are being investigated using a new computational fluid dynamics (CFD) code. The code solves the fully coupled Reynolds-averaged Navier-Stokes equations and species continuity equations in an efficient manner. It employs an iterative method and a second order differencing scheme to improve computational efficiency. The code is currently being applied to study shock wave/boundary layer interactions in premixed combustible gases, and to investigate the ram accelerator concept. Results obtained for a ram accelerator configuration indicate a new combustion mechanism in which a shock wave induces combustion in the boundary layer, which then propagates outward and downstream. The combustion process creates a high pressure region over the back of the projectile resulting in a net positive thrust forward.
Validation of hydrogen gas stratification and mixing models

DOE PAGES

Wu, Hsingtzu; Zhao, Haihua

2015-05-26

Two validation benchmarks confirm that the BMIX++ code is capable of simulating unintended hydrogen release scenarios efficiently. The BMIX++ (UC Berkeley mechanistic MIXing code in C++) code has been developed to accurately and efficiently predict the fluid mixture distribution and heat transfer in large stratified enclosures for accident analyses and design optimizations. The BMIX++ code uses a scaling based one-dimensional method to achieve large reduction in computational effort compared to a 3-D computational fluid dynamics (CFD) simulation. Two BMIX++ benchmark models have been developed. One is for a single buoyant jet in an open space and another is for amore » large sealed enclosure with both a jet source and a vent near the floor. Both of them have been validated by comparisons with experimental data. Excellent agreements are observed. The entrainment coefficients of 0.09 and 0.08 are found to fit the experimental data for hydrogen leaks with the Froude number of 99 and 268 best, respectively. In addition, the BIX++ simulation results of the average helium concentration for an enclosure with a vent and a single jet agree with the experimental data within a margin of about 10% for jet flow rates ranging from 1.21 × 10⁻⁴ to 3.29 × 10⁻⁴ m³/s. In conclusion, computing time for each BMIX++ model with a normal desktop computer is less than 5 min.« less
Parallel computing techniques for rotorcraft aerodynamics

NASA Astrophysics Data System (ADS)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
New Developments in Modeling MHD Systems on High Performance Computing Architectures

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.

2009-04-01

Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).
Experimental and Computational Analysis of Unidirectional Flow Through Stirling Engine Heater Head

NASA Technical Reports Server (NTRS)

Wilson, Scott D.; Dyson, Rodger W.; Tew, Roy C.; Demko, Rikako

2006-01-01

A high efficiency Stirling Radioisotope Generator (SRG) is being developed for possible use in long-duration space science missions. NASA s advanced technology goals for next generation Stirling convertors include increasing the Carnot efficiency and percent of Carnot efficiency. To help achieve these goals, a multi-dimensional Computational Fluid Dynamics (CFD) code is being developed to numerically model unsteady fluid flow and heat transfer phenomena of the oscillating working gas inside Stirling convertors. In the absence of transient pressure drop data for the zero mean oscillating multi-dimensional flows present in the Technology Demonstration Convertors on test at NASA Glenn Research Center, unidirectional flow pressure drop test data is used to compare against 2D and 3D computational solutions. This study focuses on tracking pressure drop and mass flow rate data for unidirectional flow though a Stirling heater head using a commercial CFD code (CFD-ACE). The commercial CFD code uses a porous-media model which is dependent on permeability and the inertial coefficient present in the linear and nonlinear terms of the Darcy-Forchheimer equation. Permeability and inertial coefficient were calculated from unidirectional flow test data. CFD simulations of the unidirectional flow test were validated using the porous-media model input parameters which increased simulation accuracy by 14 percent on average.

Distributed Estimation, Coding, and Scheduling in Wireless Visual Sensor Networks

ERIC Educational Resources Information Center

Yu, Chao

2013-01-01

In this thesis, we consider estimation, coding, and sensor scheduling for energy efficient operation of wireless visual sensor networks (VSN), which consist of battery-powered wireless sensors with sensing (imaging), computation, and communication capabilities. The competing requirements for applications of these wireless sensor networks (WSN)…
New Parallel computing framework for radiation transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Multiple grid problems on concurrent-processing computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.

1986-01-01

Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.
Observations on computational methodologies for use in large-scale, gradient-based, multidisciplinary design incorporating advanced CFD codes

NASA Technical Reports Server (NTRS)

Newman, P. A.; Hou, G. J.-W.; Jones, H. E.; Taylor, A. C., III; Korivi, V. M.

1992-01-01

How a combination of various computational methodologies could reduce the enormous computational costs envisioned in using advanced CFD codes in gradient based optimized multidisciplinary design (MdD) procedures is briefly outlined. Implications of these MdD requirements upon advanced CFD codes are somewhat different than those imposed by a single discipline design. A means for satisfying these MdD requirements for gradient information is presented which appear to permit: (1) some leeway in the CFD solution algorithms which can be used; (2) an extension to 3-D problems; and (3) straightforward use of other computational methodologies. Many of these observations have previously been discussed as possibilities for doing parts of the problem more efficiently; the contribution here is observing how they fit together in a mutually beneficial way.
Multiprocessing on supercomputers for computational aerodynamics

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Mehta, Unmeel B.

1991-01-01

Little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPS or more) to improve turnaround time in computational aerodynamics. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, such improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) is applied through multitasking via a strategy that requires relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-Fortran-Unix interface. The existing code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor.
Development and application of unified algorithms for problems in computational science

NASA Technical Reports Server (NTRS)

Shankar, Vijaya; Chakravarthy, Sukumar

1987-01-01

A framework is presented for developing computationally unified numerical algorithms for solving nonlinear equations that arise in modeling various problems in mathematical physics. The concept of computational unification is an attempt to encompass efficient solution procedures for computing various nonlinear phenomena that may occur in a given problem. For example, in Computational Fluid Dynamics (CFD), a unified algorithm will be one that allows for solutions to subsonic (elliptic), transonic (mixed elliptic-hyperbolic), and supersonic (hyperbolic) flows for both steady and unsteady problems. The objectives are: development of superior unified algorithms emphasizing accuracy and efficiency aspects; development of codes based on selected algorithms leading to validation; application of mature codes to realistic problems; and extension/application of CFD-based algorithms to problems in other areas of mathematical physics. The ultimate objective is to achieve integration of multidisciplinary technologies to enhance synergism in the design process through computational simulation. Specific unified algorithms for a hierarchy of gas dynamics equations and their applications to two other areas: electromagnetic scattering, and laser-materials interaction accounting for melting.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Development of Efficient Real-Fluid Model in Simulating Liquid Rocket Injector Flows

NASA Technical Reports Server (NTRS)

Cheng, Gary; Farmer, Richard

2003-01-01

The characteristics of propellant mixing near the injector have a profound effect on the liquid rocket engine performance. However, the flow features near the injector of liquid rocket engines are extremely complicated, for example supercritical-pressure spray, turbulent mixing, and chemical reactions are present. Previously, a homogeneous spray approach with a real-fluid property model was developed to account for the compressibility and evaporation effects such that thermodynamics properties of a mixture at a wide range of pressures and temperatures can be properly calculated, including liquid-phase, gas- phase, two-phase, and dense fluid regions. The developed homogeneous spray model demonstrated a good success in simulating uni- element shear coaxial injector spray combustion flows. However, the real-fluid model suffered a computational deficiency when applied to a pressure-based computational fluid dynamics (CFD) code. The deficiency is caused by the pressure and enthalpy being the independent variables in the solution procedure of a pressure-based code, whereas the real-fluid model utilizes density and temperature as independent variables. The objective of the present research work is to improve the computational efficiency of the real-fluid property model in computing thermal properties. The proposed approach is called an efficient real-fluid model, and the improvement of computational efficiency is achieved by using a combination of a liquid species and a gaseous species to represent a real-fluid species.
Extending R packages to support 64-bit compiled code: An illustration with spam64 and GIMMS NDVI3g data

NASA Astrophysics Data System (ADS)

Gerber, Florian; Mösinger, Kaspar; Furrer, Reinhard

2017-07-01

Software packages for spatial data often implement a hybrid approach of interpreted and compiled programming languages. The compiled parts are usually written in C, C++, or Fortran, and are efficient in terms of computational speed and memory usage. Conversely, the interpreted part serves as a convenient user-interface and calls the compiled code for computationally demanding operations. The price paid for the user friendliness of the interpreted component is-besides performance-the limited access to low level and optimized code. An example of such a restriction is the 64-bit vector support of the widely used statistical language R. On the R side, users do not need to change existing code and may not even notice the extension. On the other hand, interfacing 64-bit compiled code efficiently is challenging. Since many R packages for spatial data could benefit from 64-bit vectors, we investigate strategies to efficiently pass 64-bit vectors to compiled languages. More precisely, we show how to simply extend existing R packages using the foreign function interface to seamlessly support 64-bit vectors. This extension is shown with the sparse matrix algebra R package spam. The new capabilities are illustrated with an example of GIMMS NDVI3g data featuring a parametric modeling approach for a non-stationary covariance matrix.
Shielding from space radiations

NASA Technical Reports Server (NTRS)

Chang, C. Ken; Badavi, Forooz F.; Tripathi, Ram K.

1993-01-01

This Progress Report covering the period of December 1, 1992 to June 1, 1993 presents the development of an analytical solution to the heavy ion transport equation in terms of Green's function formalism. The mathematical development results are recasted into a highly efficient computer code for space applications. The efficiency of this algorithm is accomplished by a nonperturbative technique of extending the Green's function over the solution domain. The code may also be applied to accelerator boundary conditions to allow code validation in laboratory experiments. Results from the isotopic version of the code with 59 isotopes present for a single layer target material, for the case of an iron beam projectile at 600 MeV/nucleon in water is presented. A listing of the single layer isotopic version of the code is included.
An installed nacelle design code using a multiblock Euler solver. Volume 1: Theory document

NASA Technical Reports Server (NTRS)

Chen, H. C.

1992-01-01

An efficient multiblock Euler design code was developed for designing a nacelle installed on geometrically complex airplane configurations. This approach employed a design driver based on a direct iterative surface curvature method developed at LaRC. A general multiblock Euler flow solver was used for computing flow around complex geometries. The flow solver used a finite-volume formulation with explicit time-stepping to solve the Euler Equations. It used a multiblock version of the multigrid method to accelerate the convergence of the calculations. The design driver successively updated the surface geometry to reduce the difference between the computed and target pressure distributions. In the flow solver, the change in surface geometry was simulated by applying surface transpiration boundary conditions to avoid repeated grid generation during design iterations. Smoothness of the designed surface was ensured by alternate application of streamwise and circumferential smoothings. The capability and efficiency of the code was demonstrated through the design of both an isolated nacelle and an installed nacelle at various flow conditions. Information on the execution of the computer program is provided in volume 2.
Validation of a pair of computer codes for estimation and optimization of subsonic aerodynamic performance of simple hinged-flap systems for thin swept wings

NASA Technical Reports Server (NTRS)

Carlson, Harry W.; Darden, Christine M.

1988-01-01

Extensive correlations of computer code results with experimental data are employed to illustrate the use of linearized theory attached flow methods for the estimation and optimization of the aerodynamic performance of simple hinged flap systems. Use of attached flow methods is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. A variety of swept wing configurations are considered ranging from fighters to supersonic transports, all with leading- and trailing-edge flaps for enhancement of subsonic aerodynamic efficiency. The results indicate that linearized theory attached flow computer code methods provide a rational basis for the estimation and optimization of flap system aerodynamic performance at subsonic speeds. The analysis also indicates that vortex flap design is not an opposing approach but is closely related to attached flow design concepts. The successful vortex flap design actually suppresses the formation of detached vortices to produce a small vortex which is restricted almost entirely to the leading edge flap itself.
Collection Efficiency and Ice Accretion Characteristics of Two Full Scale and One 1/4 Scale Business Jet Horizontal Tails

NASA Technical Reports Server (NTRS)

Bidwell, Colin S.; Papadakis, Michael

2005-01-01

Collection efficiency and ice accretion calculations have been made for a series of business jet horizontal tail configurations using a three-dimensional panel code, an adaptive grid code, and the NASA Glenn LEWICE3D grid based ice accretion code. The horizontal tail models included two full scale wing tips and a 25 percent scale model. Flow solutions for the horizontal tails were generated using the PMARC panel code. Grids used in the ice accretion calculations were generated using the adaptive grid code ICEGRID. The LEWICE3D grid based ice accretion program was used to calculate impingement efficiency and ice shapes. Ice shapes typifying rime and mixed icing conditions were generated for a 30 minute hold condition. All calculations were performed on an SGI Octane computer. The results have been compared to experimental flow and impingement data. In general, the calculated flow and collection efficiencies compared well with experiment, and the ice shapes appeared representative of the rime and mixed icing conditions for which they were calculated.
Fundamental differences between optimization code test problems in engineering applications

NASA Technical Reports Server (NTRS)

Eason, E. D.

1984-01-01

The purpose here is to suggest that there is at least one fundamental difference between the problems used for testing optimization codes and the problems that engineers often need to solve; in particular, the level of precision that can be practically achieved in the numerical evaluation of the objective function, derivatives, and constraints. This difference affects the performance of optimization codes, as illustrated by two examples. Two classes of optimization problem were defined. Class One functions and constraints can be evaluated to a high precision that depends primarily on the word length of the computer. Class Two functions and/or constraints can only be evaluated to a moderate or a low level of precision for economic or modeling reasons, regardless of the computer word length. Optimization codes have not been adequately tested on Class Two problems. There are very few Class Two test problems in the literature, while there are literally hundreds of Class One test problems. The relative performance of two codes may be markedly different for Class One and Class Two problems. Less sophisticated direct search type codes may be less likely to be confused or to waste many function evaluations on Class Two problems. The analysis accuracy and minimization performance are related in a complex way that probably varies from code to code. On a problem where the analysis precision was varied over a range, the simple Hooke and Jeeves code was more efficient at low precision while the Powell code was more efficient at high precision.
Implementation and Characterization of Three-Dimensional Particle-in-Cell Codes on Multiple-Instruction-Multiple-Data Massively Parallel Supercomputers

NASA Technical Reports Server (NTRS)

Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.

1995-01-01

A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Inter-view prediction of intra mode decision for high-efficiency video coding-based multiview video coding

NASA Astrophysics Data System (ADS)

da Silva, Thaísa Leal; Agostini, Luciano Volcan; da Silva Cruz, Luis A.

2014-05-01

Intra prediction is a very important tool in current video coding standards. High-efficiency video coding (HEVC) intra prediction presents relevant gains in encoding efficiency when compared to previous standards, but with a very important increase in the computational complexity since 33 directional angular modes must be evaluated. Motivated by this high complexity, this article presents a complexity reduction algorithm developed to reduce the HEVC intra mode decision complexity targeting multiview videos. The proposed algorithm presents an efficient fast intra prediction compliant with singleview and multiview video encoding. This fast solution defines a reduced subset of intra directions according to the video texture and it exploits the relationship between prediction units (PUs) of neighbor depth levels of the coding tree. This fast intra coding procedure is used to develop an inter-view prediction method, which exploits the relationship between the intra mode directions of adjacent views to further accelerate the intra prediction process in multiview video encoding applications. When compared to HEVC simulcast, our method achieves a complexity reduction of up to 47.77%, at the cost of an average BD-PSNR loss of 0.08 dB.
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

NASA Technical Reports Server (NTRS)

Long, Lyle N.; Brentner, Kenneth S.

2000-01-01

This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks

DOE Office of Scientific and Technical Information (OSTI.GOV)

French, S; Nazareth, D; Bellor, M

Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less
Enhancement of the CAVE computer code. [aerodynamic heating package for nose cones and scramjet engine sidewalls

NASA Technical Reports Server (NTRS)

Rathjen, K. A.; Burk, H. O.

1983-01-01

The computer code CAVE (Conduction Analysis via Eigenvalues) is a convenient and efficient computer code for predicting two dimensional temperature histories within thermal protection systems for hypersonic vehicles. The capabilities of CAVE were enhanced by incorporation of the following features into the code: real gas effects in the aerodynamic heating predictions, geometry and aerodynamic heating package for analyses of cone shaped bodies, input option to change from laminar to turbulent heating predictions on leading edges, modification to account for reduction in adiabatic wall temperature with increase in leading sweep, geometry package for two dimensional scramjet engine sidewall, with an option for heat transfer to external and internal surfaces, print out modification to provide tables of select temperatures for plotting and storage, and modifications to the radiation calculation procedure to eliminate temperature oscillations induced by high heating rates. These new features are described.

Performance of a parallel code for the Euler equations on hypercube computers

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

1990-01-01

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
An evaluation of TRAC-PF1/MOD1 computer code performance during posttest simulations of Semiscale MOD-2C feedwater line break transients

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hall, D.G.: Watkins, J.C.

This report documents an evaluation of the TRAC-PF1/MOD1 reactor safety analysis computer code during computer simulations of feedwater line break transients. The experimental data base for the evaluation included the results of three bottom feedwater line break tests performed in the Semiscale Mod-2C test facility. The tests modeled 14.3% (S-FS-7), 50% (S-FS-11), and 100% (S-FS-6B) breaks. The test facility and the TRAC-PF1/MOD1 model used in the calculations are described. Evaluations of the accuracy of the calculations are presented in the form of comparisons of measured and calculated histories of selected parameters associated with the primary and secondary systems. In additionmore » to evaluating the accuracy of the code calculations, the computational performance of the code during the simulations was assessed. A conclusion was reached that the code is capable of making feedwater line break transient calculations efficiently, but there is room for significant improvements in the simulations that were performed. Recommendations are made for follow-on investigations to determine how to improve future feedwater line break calculations and for code improvements to make the code easier to use.« less
Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
PHoToNs–A parallel heterogeneous and threads oriented code for cosmological N-body simulation

NASA Astrophysics Data System (ADS)

Wang, Qiao; Cao, Zong-Yan; Gao, Liang; Chi, Xue-Bin; Meng, Chen; Wang, Jie; Wang, Long

2018-06-01

We introduce a new code for cosmological simulations, PHoToNs, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer (HPC) systems and threads oriented programming. PHoToNs adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh (PM) algorithm to compute the long-range force, the Tree algorithm to compute the short range force and the direct summation Particle-Particle (PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHoToNs scales well and efficiency of the PP kernel achieves 68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.
Computational fluid dynamics combustion analysis evaluation

NASA Technical Reports Server (NTRS)

Kim, Y. M.; Shang, H. M.; Chen, C. P.; Ziebarth, J. P.

1992-01-01

This study involves the development of numerical modelling in spray combustion. These modelling efforts are mainly motivated to improve the computational efficiency in the stochastic particle tracking method as well as to incorporate the physical submodels of turbulence, combustion, vaporization, and dense spray effects. The present mathematical formulation and numerical methodologies can be casted in any time-marching pressure correction methodologies (PCM) such as FDNS code and MAST code. A sequence of validation cases involving steady burning sprays and transient evaporating sprays will be included.
General Monte Carlo reliability simulation code including common mode failures and HARP fault/error-handling

NASA Technical Reports Server (NTRS)

Platt, M. E.; Lewis, E. E.; Boehm, F.

1991-01-01

A Monte Carlo Fortran computer program was developed that uses two variance reduction techniques for computing system reliability applicable to solving very large highly reliable fault-tolerant systems. The program is consistent with the hybrid automated reliability predictor (HARP) code which employs behavioral decomposition and complex fault-error handling models. This new capability is called MC-HARP which efficiently solves reliability models with non-constant failures rates (Weibull). Common mode failure modeling is also a specialty.
Efficient Network Coding-Based Loss Recovery for Reliable Multicast in Wireless Networks

NASA Astrophysics Data System (ADS)

Chi, Kaikai; Jiang, Xiaohong; Ye, Baoliu; Horiguchi, Susumu

Recently, network coding has been applied to the loss recovery of reliable multicast in wireless networks [19], where multiple lost packets are XOR-ed together as one packet and forwarded via single retransmission, resulting in a significant reduction of bandwidth consumption. In this paper, we first prove that maximizing the number of lost packets for XOR-ing, which is the key part of the available network coding-based reliable multicast schemes, is actually a complex NP-complete problem. To address this limitation, we then propose an efficient heuristic algorithm for finding an approximately optimal solution of this optimization problem. Furthermore, we show that the packet coding principle of maximizing the number of lost packets for XOR-ing sometimes cannot fully exploit the potential coding opportunities, and we then further propose new heuristic-based schemes with a new coding principle. Simulation results demonstrate that the heuristic-based schemes have very low computational complexity and can achieve almost the same transmission efficiency as the current coding-based high-complexity schemes. Furthermore, the heuristic-based schemes with the new coding principle not only have very low complexity, but also slightly outperform the current high-complexity ones.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding

NASA Astrophysics Data System (ADS)

Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae

2017-12-01

High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
Nested polynomial trends for the improvement of Gaussian process-based predictors

NASA Astrophysics Data System (ADS)

Perrin, G.; Soize, C.; Marque-Pucheu, S.; Garnier, J.

2017-10-01

The role of simulation keeps increasing for the sensitivity analysis and the uncertainty quantification of complex systems. Such numerical procedures are generally based on the processing of a huge amount of code evaluations. When the computational cost associated with one particular evaluation of the code is high, such direct approaches based on the computer code only, are not affordable. Surrogate models have therefore to be introduced to interpolate the information given by a fixed set of code evaluations to the whole input space. When confronted to deterministic mappings, the Gaussian process regression (GPR), or kriging, presents a good compromise between complexity, efficiency and error control. Such a method considers the quantity of interest of the system as a particular realization of a Gaussian stochastic process, whose mean and covariance functions have to be identified from the available code evaluations. In this context, this work proposes an innovative parametrization of this mean function, which is based on the composition of two polynomials. This approach is particularly relevant for the approximation of strongly non linear quantities of interest from very little information. After presenting the theoretical basis of this method, this work compares its efficiency to alternative approaches on a series of examples.
Comprehensive silicon solar cell computer modeling

NASA Technical Reports Server (NTRS)

Lamorte, M. F.

1984-01-01

The development of an efficient, comprehensive Si solar cell modeling program that has the capability of simulation accuracy of 5 percent or less is examined. A general investigation of computerized simulation is provided. Computer simulation programs are subdivided into a number of major tasks: (1) analytical method used to represent the physical system; (2) phenomena submodels that comprise the simulation of the system; (3) coding of the analysis and the phenomena submodels; (4) coding scheme that results in efficient use of the CPU so that CPU costs are low; and (5) modularized simulation program with respect to structures that may be analyzed, addition and/or modification of phenomena submodels as new experimental data become available, and the addition of other photovoltaic materials.
Hardware-efficient bosonic quantum error-correcting codes based on symmetry operators

NASA Astrophysics Data System (ADS)

Niu, Murphy Yuezhen; Chuang, Isaac L.; Shapiro, Jeffrey H.

2018-03-01

We establish a symmetry-operator framework for designing quantum error-correcting (QEC) codes based on fundamental properties of the underlying system dynamics. Based on this framework, we propose three hardware-efficient bosonic QEC codes that are suitable for χ(2 )-interaction based quantum computation in multimode Fock bases: the χ(2 ) parity-check code, the χ(2 ) embedded error-correcting code, and the χ(2 ) binomial code. All of these QEC codes detect photon-loss or photon-gain errors by means of photon-number parity measurements, and then correct them via χ(2 ) Hamiltonian evolutions and linear-optics transformations. Our symmetry-operator framework provides a systematic procedure for finding QEC codes that are not stabilizer codes, and it enables convenient extension of a given encoding to higher-dimensional qudit bases. The χ(2 ) binomial code is of special interest because, with m ≤N identified from channel monitoring, it can correct m -photon-loss errors, or m -photon-gain errors, or (m -1 )th -order dephasing errors using logical qudits that are encoded in O (N ) photons. In comparison, other bosonic QEC codes require O (N2) photons to correct the same degree of bosonic errors. Such improved photon efficiency underscores the additional error-correction power that can be provided by channel monitoring. We develop quantum Hamming bounds for photon-loss errors in the code subspaces associated with the χ(2 ) parity-check code and the χ(2 ) embedded error-correcting code, and we prove that these codes saturate their respective bounds. Our χ(2 ) QEC codes exhibit hardware efficiency in that they address the principal error mechanisms and exploit the available physical interactions of the underlying hardware, thus reducing the physical resources required for implementing their encoding, decoding, and error-correction operations, and their universal encoded-basis gate sets.
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies

PubMed Central

Russ, Daniel E.; Ho, Kwan-Yuet; Colt, Joanne S.; Armenti, Karla R.; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P.; Karagas, Margaret R.; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T.; Johnson, Calvin A.; Friesen, Melissa C.

2016-01-01

Background Mapping job titles to standardized occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiologic studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Methods Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14,983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in two occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. Results For 11,991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6- and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (kappa: 0.6–0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Conclusions Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiologic studies. PMID:27102331
Development of a computer code for calculating the steady super/hypersonic inviscid flow around real configurations. Volume 2: Code description

NASA Technical Reports Server (NTRS)

Marconi, F.; Yaeger, L.

1976-01-01

A numerical procedure was developed to compute the inviscid super/hypersonic flow field about complex vehicle geometries accurately and efficiently. A second-order accurate finite difference scheme is used to integrate the three-dimensional Euler equations in regions of continuous flow, while all shock waves are computed as discontinuities via the Rankine-Hugoniot jump conditions. Conformal mappings are used to develop a computational grid. The effects of blunt nose entropy layers are computed in detail. Real gas effects for equilibrium air are included using curve fits of Mollier charts. Typical calculated results for shuttle orbiter, hypersonic transport, and supersonic aircraft configurations are included to demonstrate the usefulness of this tool.
Computation of unsteady transonic aerodynamics with steady state fixed by truncation error injection

NASA Technical Reports Server (NTRS)

Fung, K.-Y.; Fu, J.-K.

1985-01-01

A novel technique is introduced for efficient computations of unsteady transonic aerodynamics. The steady flow corresponding to body shape is maintained by truncation error injection while the perturbed unsteady flows corresponding to unsteady body motions are being computed. This allows the use of different grids comparable to the characteristic length scales of the steady and unsteady flows and, hence, allows efficient computation of the unsteady perturbations. An example of typical unsteady computation of flow over a supercritical airfoil shows that substantial savings in computation time and storage without loss of solution accuracy can easily be achieved. This technique is easy to apply and requires very few changes to existing codes.
A computer code for three-dimensional incompressible flows using nonorthogonal body-fitted coordinate systems

NASA Technical Reports Server (NTRS)

Chen, Y. S.

1986-01-01

In this report, a numerical method for solving the equations of motion of three-dimensional incompressible flows in nonorthogonal body-fitted coordinate (BFC) systems has been developed. The equations of motion are transformed to a generalized curvilinear coordinate system from which the transformed equations are discretized using finite difference approximations in the transformed domain. The hybrid scheme is used to approximate the convection terms in the governing equations. Solutions of the finite difference equations are obtained iteratively by using a pressure-velocity correction algorithm (SIMPLE-C). Numerical examples of two- and three-dimensional, laminar and turbulent flow problems are employed to evaluate the accuracy and efficiency of the present computer code. The user's guide and computer program listing of the present code are also included.
Computationally efficient methods for modelling laser wakefield acceleration in the blowout regime

NASA Astrophysics Data System (ADS)

Cowan, B. M.; Kalmykov, S. Y.; Beck, A.; Davoine, X.; Bunkers, K.; Lifschitz, A. F.; Lefebvre, E.; Bruhwiler, D. L.; Shadwick, B. A.; Umstadter, D. P.; Umstadter

2012-08-01

Electron self-injection and acceleration until dephasing in the blowout regime is studied for a set of initial conditions typical of recent experiments with 100-terawatt-class lasers. Two different approaches to computationally efficient, fully explicit, 3D particle-in-cell modelling are examined. First, the Cartesian code vorpal (Nieter, C. and Cary, J. R. 2004 VORPAL: a versatile plasma simulation code. J. Comput. Phys. 196, 538) using a perfect-dispersion electromagnetic solver precisely describes the laser pulse and bubble dynamics, taking advantage of coarser resolution in the propagation direction, with a proportionally larger time step. Using third-order splines for macroparticles helps suppress the sampling noise while keeping the usage of computational resources modest. The second way to reduce the simulation load is using reduced-geometry codes. In our case, the quasi-cylindrical code calder-circ (Lifschitz, A. F. et al. 2009 Particle-in-cell modelling of laser-plasma interaction using Fourier decomposition. J. Comput. Phys. 228(5), 1803-1814) uses decomposition of fields and currents into a set of poloidal modes, while the macroparticles move in the Cartesian 3D space. Cylindrical symmetry of the interaction allows using just two modes, reducing the computational load to roughly that of a planar Cartesian simulation while preserving the 3D nature of the interaction. This significant economy of resources allows using fine resolution in the direction of propagation and a small time step, making numerical dispersion vanishingly small, together with a large number of particles per cell, enabling good particle statistics. Quantitative agreement of two simulations indicates that these are free of numerical artefacts. Both approaches thus retrieve the physically correct evolution of the plasma bubble, recovering the intrinsic connection of electron self-injection to the nonlinear optical evolution of the driver.
Probabilistic Structural Analysis Methods (PSAM) for select space propulsion system components, part 2

NASA Technical Reports Server (NTRS)

1991-01-01

The technical effort and computer code enhancements performed during the sixth year of the Probabilistic Structural Analysis Methods program are summarized. Various capabilities are described to probabilistically combine structural response and structural resistance to compute component reliability. A library of structural resistance models is implemented in the Numerical Evaluations of Stochastic Structures Under Stress (NESSUS) code that included fatigue, fracture, creep, multi-factor interaction, and other important effects. In addition, a user interface was developed for user-defined resistance models. An accurate and efficient reliability method was developed and was successfully implemented in the NESSUS code to compute component reliability based on user-selected response and resistance models. A risk module was developed to compute component risk with respect to cost, performance, or user-defined criteria. The new component risk assessment capabilities were validated and demonstrated using several examples. Various supporting methodologies were also developed in support of component risk assessment.
The Plasma Simulation Code: A modern particle-in-cell code with patch-based load-balancing

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Fox, William; Abbott, Stephen; Ahmadi, Narges; Maynard, Kristofor; Wang, Liang; Ruhl, Hartmut; Bhattacharjee, Amitava

2016-08-01

This work describes the Plasma Simulation Code (PSC), an explicit, electromagnetic particle-in-cell code with support for different order particle shape functions. We review the basic components of the particle-in-cell method as well as the computational architecture of the PSC code that allows support for modular algorithms and data structure in the code. We then describe and analyze in detail a distinguishing feature of PSC: patch-based load balancing using space-filling curves which is shown to lead to major efficiency gains over unbalanced methods and a previously used simpler balancing method.
Computer code for the optimization of performance parameters of mixed explosive formulations.

PubMed

Muthurajan, H; Sivabalan, R; Talawar, M B; Venugopalan, S; Gandhe, B R

2006-08-25

LOTUSES is a novel computer code, which has been developed for the prediction of various thermodynamic properties such as heat of formation, heat of explosion, volume of explosion gaseous products and other related performance parameters. In this paper, we report LOTUSES (Version 1.4) code which has been utilized for the optimization of various high explosives in different combinations to obtain maximum possible velocity of detonation. LOTUSES (Version 1.4) code will vary the composition of mixed explosives automatically in the range of 1-100% and computes the oxygen balance as well as the velocity of detonation for various compositions in preset steps. Further, the code suggests the compositions for which least oxygen balance and the higher velocity of detonation could be achieved. Presently, the code can be applied for two component explosive compositions. The code has been validated with well-known explosives like, TNT, HNS, HNF, TATB, RDX, HMX, AN, DNA, CL-20 and TNAZ in different combinations. The new algorithm incorporated in LOTUSES (Version 1.4) enhances the efficiency and makes it a more powerful tool for the scientists/researches working in the field of high energy materials/hazardous materials.
Observations Regarding Use of Advanced CFD Analysis, Sensitivity Analysis, and Design Codes in MDO

NASA Technical Reports Server (NTRS)

Newman, Perry A.; Hou, Gene J. W.; Taylor, Arthur C., III

1996-01-01

Observations regarding the use of advanced computational fluid dynamics (CFD) analysis, sensitivity analysis (SA), and design codes in gradient-based multidisciplinary design optimization (MDO) reflect our perception of the interactions required of CFD and our experience in recent aerodynamic design optimization studies using CFD. Sample results from these latter studies are summarized for conventional optimization (analysis - SA codes) and simultaneous analysis and design optimization (design code) using both Euler and Navier-Stokes flow approximations. The amount of computational resources required for aerodynamic design using CFD via analysis - SA codes is greater than that required for design codes. Thus, an MDO formulation that utilizes the more efficient design codes where possible is desired. However, in the aerovehicle MDO problem, the various disciplines that are involved have different design points in the flight envelope; therefore, CFD analysis - SA codes are required at the aerodynamic 'off design' points. The suggested MDO formulation is a hybrid multilevel optimization procedure that consists of both multipoint CFD analysis - SA codes and multipoint CFD design codes that perform suboptimizations.

Particle trajectory computation on a 3-dimensional engine inlet. Final Report Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Kim, J. J.

1986-01-01

A 3-dimensional particle trajectory computer code was developed to compute the distribution of water droplet impingement efficiency on a 3-dimensional engine inlet. The computed results provide the essential droplet impingement data required for the engine inlet anti-icing system design and analysis. The droplet trajectories are obtained by solving the trajectory equation using the fourth order Runge-Kutta and Adams predictor-corrector schemes. A compressible 3-D full potential flow code is employed to obtain a cylindrical grid definition of the flowfield on and about the engine inlet. The inlet surface is defined mathematically through a system of bi-cubic parametric patches in order to compute the droplet impingement points accurately. Analysis results of the 3-D trajectory code obtained for an axisymmetric droplet impingement problem are in good agreement with NACA experimental data. Experimental data are not yet available for the engine inlet impingement problem analyzed. Applicability of the method to solid particle impingement problems, such as engine sand ingestion, is also demonstrated.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
ICCE/ICCAI 2000 Full & Short Papers (Teaching and Learning Processes).

ERIC Educational Resources Information Center

2000

This document contains the full and short papers on teaching and learning processes from ICCE/ICCAI 2000 (International Conference on Computers in Education/International Conference on Computer-Assisted Instruction) covering the following topics: a code restructuring tool to help scaffold novice programmers; efficient study of Kanji using…
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Efficient computation of turbulent flow in ribbed passages using a non-overlapping near-wall domain decomposition method

NASA Astrophysics Data System (ADS)

Jones, Adam; Utyuzhnikov, Sergey

2017-08-01

Turbulent flow in a ribbed channel is studied using an efficient near-wall domain decomposition (NDD) method. The NDD approach is formulated by splitting the computational domain into an inner and outer region, with an interface boundary between the two. The computational mesh covers the outer region, and the flow in this region is solved using the open-source CFD code Code_Saturne with special boundary conditions on the interface boundary, called interface boundary conditions (IBCs). The IBCs are of Robin type and incorporate the effect of the inner region on the flow in the outer region. IBCs are formulated in terms of the distance from the interface boundary to the wall in the inner region. It is demonstrated that up to 90% of the region between the ribs in the ribbed passage can be removed from the computational mesh with an error on the friction factor within 2.5%. In addition, computations with NDD are faster than computations based on low Reynolds number (LRN) models by a factor of five. Different rib heights can be studied with the same mesh in the outer region without affecting the accuracy of the friction factor. This is tested with six different rib heights in an example of a design optimisation study. It is found that the friction factors computed with NDD are almost identical to the fully-resolved results. When used for inverse problems, NDD is considerably more efficient than LRN computations because only one computation needs to be performed and only one mesh needs to be generated.
A high performance scientific cloud computing environment for materials simulations

NASA Astrophysics Data System (ADS)

Jorissen, K.; Vila, F. D.; Rehr, J. J.

2012-09-01

We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
Error Correction using Quantum Quasi-Cyclic Low-Density Parity-Check(LDPC) Codes

NASA Astrophysics Data System (ADS)

Jing, Lin; Brun, Todd; Quantum Research Team

Quasi-cyclic LDPC codes can approach the Shannon capacity and have efficient decoders. Manabu Hagiwara et al., 2007 presented a method to calculate parity check matrices with high girth. Two distinct, orthogonal matrices Hc and Hd are used. Using submatrices obtained from Hc and Hd by deleting rows, we can alter the code rate. The submatrix of Hc is used to correct Pauli X errors, and the submatrix of Hd to correct Pauli Z errors. We simulated this system for depolarizing noise on USC's High Performance Computing Cluster, and obtained the block error rate (BER) as a function of the error weight and code rate. From the rates of uncorrectable errors under different error weights we can extrapolate the BER to any small error probability. Our results show that this code family can perform reasonably well even at high code rates, thus considerably reducing the overhead compared to concatenated and surface codes. This makes these codes promising as storage blocks in fault-tolerant quantum computation. Error Correction using Quantum Quasi-Cyclic Low-Density Parity-Check(LDPC) Codes.
Quantum computation with realistic magic-state factories

NASA Astrophysics Data System (ADS)

O'Gorman, Joe; Campbell, Earl T.

2017-03-01

Leading approaches to fault-tolerant quantum computation dedicate a significant portion of the hardware to computational factories that churn out high-fidelity ancillas called magic states. Consequently, efficient and realistic factory design is of paramount importance. Here we present the most detailed resource assessment to date of magic-state factories within a surface code quantum computer, along the way introducing a number of techniques. We show that the block codes of Bravyi and Haah [Phys. Rev. A 86, 052329 (2012), 10.1103/PhysRevA.86.052329] have been systematically undervalued; we track correlated errors both numerically and analytically, providing fidelity estimates without appeal to the union bound. We also introduce a subsystem code realization of these protocols with constant time and low ancilla cost. Additionally, we confirm that magic-state factories have space-time costs that scale as a constant factor of surface code costs. We find that the magic-state factory required for postclassical factoring can be as small as 6.3 million data qubits, ignoring ancilla qubits, assuming 10-4 error gates and the availability of long-range interactions.
A general panel sizing computer code and its application to composite structural panels

NASA Technical Reports Server (NTRS)

Anderson, M. S.; Stroud, W. J.

1978-01-01

A computer code for obtaining the dimensions of optimum (least mass) stiffened composite structural panels is described. The procedure, which is based on nonlinear mathematical programming and a rigorous buckling analysis, is applicable to general cross sections under general loading conditions causing buckling. A simplified method of accounting for bow-type imperfections is also included. Design studies in the form of structural efficiency charts for axial compression loading are made with the code for blade and hat stiffened panels. The effects on panel mass of imperfections, material strength limitations, and panel stiffness requirements are also examined. Comparisons with previously published experimental data show that accounting for imperfections improves correlation between theory and experiment.
Extensions and improvements on XTRAN3S

NASA Technical Reports Server (NTRS)

Borland, C. J.

1989-01-01

Improvements to the XTRAN3S computer program are summarized. Work on this code, for steady and unsteady aerodynamic and aeroelastic analysis in the transonic flow regime has concentrated on the following areas: (1) Maintenance of the XTRAN3S code, including correction of errors, enhancement of operational capability, and installation on the Cray X-MP system; (2) Extension of the vectorization concepts in XTRAN3S to include additional areas of the code for improved execution speed; (3) Modification of the XTRAN3S algorithm for improved numerical stability for swept, tapered wing cases and improved computational efficiency; and (4) Extension of the wing-only version of XTRAN3S to include pylon and nacelle or external store capability.
PARVMEC: An Efficient, Scalable Implementation of the Variational Moments Equilibrium Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K; Hirshman, Steven Paul; Wingen, Andreas

The ability to sustain magnetically confined plasma in a state of stable equilibrium is crucial for optimal and cost-effective operations of fusion devices like tokamaks and stellarators. The Variational Moments Equilibrium Code (VMEC) is the de-facto serial application used by fusion scientists to compute magnetohydrodynamics (MHD) equilibria and study the physics of three dimensional plasmas in confined configurations. Modern fusion energy experiments have larger system scales with more interactive experimental workflows, both demanding faster analysis turnaround times on computational workloads that are stressing the capabilities of sequential VMEC. In this paper, we present PARVMEC, an efficient, parallel version of itsmore » sequential counterpart, capable of scaling to thousands of processors on distributed memory machines. PARVMEC is a non-linear code, with multiple numerical physics modules, each with its own computational complexity. A detailed speedup analysis supported by scaling results on 1,024 cores of a Cray XC30 supercomputer is presented. Depending on the mode of PARVMEC execution, speedup improvements of one to two orders of magnitude are reported. PARVMEC equips fusion scientists for the first time with a state-of-theart capability for rapid, high fidelity analyses of magnetically confined plasmas at unprecedented scales.« less
Evaluation of Proteus as a Tool for the Rapid Development of Models of Hydrologic Systems

NASA Astrophysics Data System (ADS)

Weigand, T. M.; Farthing, M. W.; Kees, C. E.; Miller, C. T.

2013-12-01

Models of modern hydrologic systems can be complex and involve a variety of operators with varying character. The goal is to implement approximations of such models that are both efficient for the developer and computationally efficient, which is a set of naturally competing objectives. Proteus is a Python-based toolbox that supports prototyping of model formulations as well as a wide variety of modern numerical methods and parallel computing. We used Proteus to develop numerical approximations for three models: Richards' equation, a brine flow model derived using the Thermodynamically Constrained Averaging Theory (TCAT), and a multiphase TCAT-based tumor growth model. For Richards' equation, we investigated discontinuous Galerkin solutions with higher order time integration based on the backward difference formulas. The TCAT brine flow model was implemented using Proteus and a variety of numerical methods were compared to hand coded solutions. Finally, an existing tumor growth model was implemented in Proteus to introduce more advanced numerics and allow the code to be run in parallel. From these three example models, Proteus was found to be an attractive open-source option for rapidly developing high quality code for solving existing and evolving computational science models.
Application of augmented-Lagrangian methods in meteorology: Comparison of different conjugate-gradient codes for large-scale minimization

NASA Technical Reports Server (NTRS)

Navon, I. M.

1984-01-01

A Lagrange multiplier method using techniques developed by Bertsekas (1982) was applied to solving the problem of enforcing simultaneous conservation of the nonlinear integral invariants of the shallow water equations on a limited area domain. This application of nonlinear constrained optimization is of the large dimensional type and the conjugate gradient method was found to be the only computationally viable method for the unconstrained minimization. Several conjugate-gradient codes were tested and compared for increasing accuracy requirements. Robustness and computational efficiency were the principal criteria.
Hypersonic simulations using open-source CFD and DSMC solvers

NASA Astrophysics Data System (ADS)

Casseau, V.; Scanlon, T. J.; John, B.; Emerson, D. R.; Brown, R. E.

2016-11-01

Hypersonic hybrid hydrodynamic-molecular gas flow solvers are required to satisfy the two essential requirements of any high-speed reacting code, these being physical accuracy and computational efficiency. The James Weir Fluids Laboratory at the University of Strathclyde is currently developing an open-source hybrid code which will eventually reconcile the direct simulation Monte-Carlo method, making use of the OpenFOAM application called dsmcFoam, and the newly coded open-source two-temperature computational fluid dynamics solver named hy2Foam. In conjunction with employing the CVDV chemistry-vibration model in hy2Foam, novel use is made of the QK rates in a CFD solver. In this paper, further testing is performed, in particular with the CFD solver, to ensure its efficacy before considering more advanced test cases. The hy2Foam and dsmcFoam codes have shown to compare reasonably well, thus providing a useful basis for other codes to compare against.
Development of a Model and Computer Code to Describe Solar Grade Silicon Production Processes

NASA Technical Reports Server (NTRS)

Srivastava, R.; Gould, R. K.

1979-01-01

Mathematical models and computer codes based on these models, which allow prediction of the product distribution in chemical reactors for converting gaseous silicon compounds to condensed-phase silicon were developed. The following tasks were accomplished: (1) formulation of a model for silicon vapor separation/collection from the developing turbulent flow stream within reactors of the Westinghouse (2) modification of an available general parabolic code to achieve solutions to the governing partial differential equations (boundary layer type) which describe migration of the vapor to the reactor walls, (3) a parametric study using the boundary layer code to optimize the performance characteristics of the Westinghouse reactor, (4) calculations relating to the collection efficiency of the new AeroChem reactor, and (5) final testing of the modified LAPP code for use as a method of predicting Si(1) droplet sizes in these reactors.
Finite-difference simulation of transonic separated flow using a full potential boundary layer interaction approach

NASA Technical Reports Server (NTRS)

Van Dalsem, W. R.; Steger, J. L.

1983-01-01

A new, fast, direct-inverse, finite-difference boundary-layer code has been developed and coupled with a full-potential transonic airfoil analysis code via new inviscid-viscous interaction algorithms. The resulting code has been used to calculate transonic separated flows. The results are in good agreement with Navier-Stokes calculations and experimental data. Solutions are obtained in considerably less computer time than Navier-Stokes solutions of equal resolution. Because efficient inviscid and viscous algorithms are used, it is expected this code will also compare favorably with other codes of its type as they become available.
Finite element analysis of inviscid subsonic boattail flow

NASA Technical Reports Server (NTRS)

Chima, R. V.; Gerhart, P. M.

1981-01-01

A finite element code for analysis of inviscid subsonic flows over arbitrary nonlifting planar or axisymmetric bodies is described. The code solves a novel primitive variable formulation of the coupled irrotationality and compressible continuity equations. Results for flow over a cylinder, a sphere, and a NACA 0012 airfoil verify the code. Computed subcritical flows over an axisymmetric boattailed afterbody compare well with finite difference results and experimental data. Interative coupling with an integral turbulent boundary layer code shows strong viscous effects on the inviscid flow. Improvements in code efficiency and extensions to transonic flows are discussed.
Numerical Study of Boundary Layer Interaction with Shocks: Method Improvement and Test Computation

NASA Technical Reports Server (NTRS)

Adams, N. A.

1995-01-01

The objective is the development of a high-order and high-resolution method for the direct numerical simulation of shock turbulent-boundary-layer interaction. Details concerning the spatial discretization of the convective terms can be found in Adams and Shariff (1995). The computer code based on this method as introduced in Adams (1994) was formulated in Cartesian coordinates and thus has been limited to simple rectangular domains. For more general two-dimensional geometries, as a compression corner, an extension to generalized coordinates is necessary. To keep the requirements or limitations for grid generation low, the extended formulation should allow for non-orthogonal grids. Still, for simplicity and cost efficiency, periodicity can be assumed in one cross-flow direction. For easy vectorization, the compact-ENO coupling algorithm as used in Adams (1994) treated whole planes normal to the derivative direction with the ENO scheme whenever at least one point of this plane satisfied the detection criterion. This is apparently too restrictive for more general geometries and more complex shock patterns. Here we introduce a localized compact-ENO coupling algorithm, which is efficient as long as the overall number of grid points treated by the ENO scheme is small compared to the total number of grid points. Validation and test computations with the final code are performed to assess the efficiency and suitability of the computer code for the problems of interest. We define a set of parameters where a direct numerical simulation of a turbulent boundary layer along a compression corner with reasonably fine resolution is affordable.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; A Recursive Maximum Likelihood Decoding

NASA Technical Reports Server (NTRS)

Lin, Shu; Fossorier, Marc

1998-01-01

The Viterbi algorithm is indeed a very simple and efficient method of implementing the maximum likelihood decoding. However, if we take advantage of the structural properties in a trellis section, other efficient trellis-based decoding algorithms can be devised. Recently, an efficient trellis-based recursive maximum likelihood decoding (RMLD) algorithm for linear block codes has been proposed. This algorithm is more efficient than the conventional Viterbi algorithm in both computation and hardware requirements. Most importantly, the implementation of this algorithm does not require the construction of the entire code trellis, only some special one-section trellises of relatively small state and branch complexities are needed for constructing path (or branch) metric tables recursively. At the end, there is only one table which contains only the most likely code-word and its metric for a given received sequence r = (r(sub 1), r(sub 2),...,r(sub n)). This algorithm basically uses the divide and conquer strategy. Furthermore, it allows parallel/pipeline processing of received sequences to speed up decoding.
A soft X-ray source based on a low divergence, high repetition rate ultraviolet laser

NASA Astrophysics Data System (ADS)

Crawford, E. A.; Hoffman, A. L.; Milroy, R. D.; Quimby, D. C.; Albrecht, G. F.

The CORK code is utilized to evaluate the applicability of low divergence ultraviolet lasers for efficient production of soft X-rays. The use of the axial hydrodynamic code wih one ozone radial expansion to estimate radial motion and laser energy is examined. The calculation of ionization levels of the plasma and radiation rates by employing the atomic physics and radiation model included in the CORK code is described. Computations using the hydrodynamic code to determine the effect of laser intensity, spot size, and wavelength on plasma electron temperature are provided. The X-ray conversion efficiencies of the lasers are analyzed. It is observed that for a 1 GW laser power the X-ray conversion efficiency is a function of spot size, only weakly dependent on pulse length for time scales exceeding 100 psec, and better conversion efficiencies are obtained at shorter wavelengths. It is concluded that these small lasers focused to 30 micron spot sizes and 10 to the 14th W/sq cm intensities are useful sources of 1-2 keV radiation.

Seismic imaging using finite-differences and parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ober, C.C.

1997-12-31

A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

DOE PAGES

Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...

2013-01-01

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
A Multiphysics and Multiscale Software Environment for Modeling Astrophysical Systems

NASA Astrophysics Data System (ADS)

Portegies Zwart, Simon; McMillan, Steve; O'Nualláin, Breanndán; Heggie, Douglas; Lombardi, James; Hut, Piet; Banerjee, Sambaran; Belkus, Houria; Fragos, Tassos; Fregeau, John; Fuji, Michiko; Gaburov, Evghenii; Glebbeek, Evert; Groen, Derek; Harfst, Stefan; Izzard, Rob; Jurić, Mario; Justham, Stephen; Teuben, Peter; van Bever, Joris; Yaron, Ofer; Zemp, Marcel

We present MUSE, a software framework for tying together existing computational tools for different astrophysical domains into a single multiphysics, multiscale workload. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly-coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for a generalized stellar systems workload. MUSE has now reached a "Noah's Ark" milestone, with two available numerical solvers for each domain. MUSE can treat small stellar associations, galaxies and everything in between, including planetary systems, dense stellar clusters and galactic nuclei. Here we demonstrate an examples calculated with MUSE: the merger of two galaxies. In addition we demonstrate the working of MUSE on a distributed computer. The current MUSE code base is publicly available as open source at http://muse.li.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Trinary signed-digit arithmetic using an efficient encoding scheme

NASA Astrophysics Data System (ADS)

Salim, W. Y.; Alam, M. S.; Fyath, R. S.; Ali, S. A.

2000-09-01

The trinary signed-digit (TSD) number system is of interest for ultrafast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.
One-step trinary signed-digit arithmetic using an efficient encoding scheme

NASA Astrophysics Data System (ADS)

Salim, W. Y.; Fyath, R. S.; Ali, S. A.; Alam, Mohammad S.

2000-11-01

The trinary signed-digit (TSD) number system is of interest for ultra fast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.
Tail Biting Trellis Representation of Codes: Decoding and Construction

NASA Technical Reports Server (NTRS)

Shao. Rose Y.; Lin, Shu; Fossorier, Marc

1999-01-01

This paper presents two new iterative algorithms for decoding linear codes based on their tail biting trellises, one is unidirectional and the other is bidirectional. Both algorithms are computationally efficient and achieves virtually optimum error performance with a small number of decoding iterations. They outperform all the previous suboptimal decoding algorithms. The bidirectional algorithm also reduces decoding delay. Also presented in the paper is a method for constructing tail biting trellises for linear block codes.
Numerical Analysis of Dusty-Gas Flows

NASA Astrophysics Data System (ADS)

Saito, T.

2002-02-01

This paper presents the development of a numerical code for simulating unsteady dusty-gas flows including shock and rarefaction waves. The numerical results obtained for a shock tube problem are used for validating the accuracy and performance of the code. The code is then extended for simulating two-dimensional problems. Since the interactions between the gas and particle phases are calculated with the operator splitting technique, we can choose numerical schemes independently for the different phases. A semi-analytical method is developed for the dust phase, while the TVD scheme of Harten and Yee is chosen for the gas phase. Throughout this study, computations are carried out on SGI Origin2000, a parallel computer with multiple of RISC based processors. The efficient use of the parallel computer system is an important issue and the code implementation on Origin2000 is also described. Flow profiles of both the gas and solid particles behind the steady shock wave are calculated by integrating the steady conservation equations. The good agreement between the pseudo-stationary solutions and those from the current numerical code validates the numerical approach and the actual coding. The pseudo-stationary shock profiles can also be used as initial conditions of unsteady multidimensional simulations.
Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

NASA Technical Reports Server (NTRS)

Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

1994-01-01

The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Simulation of 2D Kinetic Effects in Plasmas using the Grid Based Continuum Code LOKI

NASA Astrophysics Data System (ADS)

Banks, Jeffrey; Berger, Richard; Chapman, Tom; Brunner, Stephan

2016-10-01

Kinetic simulation of multi-dimensional plasma waves through direct discretization of the Vlasov equation is a useful tool to study many physical interactions and is particularly attractive for situations where minimal fluctuation levels are desired, for instance, when measuring growth rates of plasma wave instabilities. However, direct discretization of phase space can be computationally expensive, and as a result there are few examples of published results using Vlasov codes in more than a single configuration space dimension. In an effort to fill this gap we have developed the Eulerian-based kinetic code LOKI that evolves the Vlasov-Poisson system in 2+2-dimensional phase space. The code is designed to reduce the cost of phase-space computation by using fully 4th order accurate conservative finite differencing, while retaining excellent parallel scalability that efficiently uses large scale computing resources. In this poster I will discuss the algorithms used in the code as well as some aspects of their parallel implementation using MPI. I will also overview simulation results of basic plasma wave instabilities relevant to laser plasma interaction, which have been obtained using the code.
FAST: framework for heterogeneous medical image computing and visualization.

PubMed

Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-11-01

Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

PubMed Central

Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

2014-01-01

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Multi-stage decoding for multi-level block modulation codes

NASA Technical Reports Server (NTRS)

Lin, Shu

1991-01-01

In this paper, we investigate various types of multi-stage decoding for multi-level block modulation codes, in which the decoding of a component code at each stage can be either soft-decision or hard-decision, maximum likelihood or bounded-distance. Error performance of codes is analyzed for a memoryless additive channel based on various types of multi-stage decoding, and upper bounds on the probability of an incorrect decoding are derived. Based on our study and computation results, we find that, if component codes of a multi-level modulation code and types of decoding at various stages are chosen properly, high spectral efficiency and large coding gain can be achieved with reduced decoding complexity. In particular, we find that the difference in performance between the suboptimum multi-stage soft-decision maximum likelihood decoding of a modulation code and the single-stage optimum decoding of the overall code is very small: only a fraction of dB loss in SNR at the probability of an incorrect decoding for a block of 10(exp -6). Multi-stage decoding of multi-level modulation codes really offers a way to achieve the best of three worlds, bandwidth efficiency, coding gain, and decoding complexity.
ACCESS 1: Approximation Concepts Code for Efficient Structural Synthesis program documentation and user's guide

NASA Technical Reports Server (NTRS)

Miura, H.; Schmit, L. A., Jr.

1976-01-01

The program documentation and user's guide for the ACCESS-1 computer program is presented. ACCESS-1 is a research oriented program which implements a collection of approximation concepts to achieve excellent efficiency in structural synthesis. The finite element method is used for structural analysis and general mathematical programming algorithms are applied in the design optimization procedure. Implementation of the computer program, preparation of input data and basic program structure are described, and three illustrative examples are given.
HERCULES: A Pattern Driven Code Transformation System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kartsaklis, Christos; Hernandez, Oscar R; Hsu, Chung-Hsing

2012-01-01

New parallel computers are emerging, but developing efficient scientific code for them remains difficult. A scientist must manage not only the science-domain complexity but also the performance-optimization complexity. HERCULES is a code transformation system designed to help the scientist to separate the two concerns, which improves code maintenance, and facilitates performance optimization. The system combines three technologies, code patterns, transformation scripts and compiler plugins, to provide the scientist with an environment to quickly implement code transformations that suit his needs. Unlike existing code optimization tools, HERCULES is unique in its focus on user-level accessibility. In this paper we discuss themore » design, implementation and an initial evaluation of HERCULES.« less
Hard decoding algorithm for optimizing thresholds under general Markovian noise

NASA Astrophysics Data System (ADS)

Chamberland, Christopher; Wallman, Joel; Beale, Stefanie; Laflamme, Raymond

2017-04-01

Quantum error correction is instrumental in protecting quantum systems from noise in quantum computing and communication settings. Pauli channels can be efficiently simulated and threshold values for Pauli error rates under a variety of error-correcting codes have been obtained. However, realistic quantum systems can undergo noise processes that differ significantly from Pauli noise. In this paper, we present an efficient hard decoding algorithm for optimizing thresholds and lowering failure rates of an error-correcting code under general completely positive and trace-preserving (i.e., Markovian) noise. We use our hard decoding algorithm to study the performance of several error-correcting codes under various non-Pauli noise models by computing threshold values and failure rates for these codes. We compare the performance of our hard decoding algorithm to decoders optimized for depolarizing noise and show improvements in thresholds and reductions in failure rates by several orders of magnitude. Our hard decoding algorithm can also be adapted to take advantage of a code's non-Pauli transversal gates to further suppress noise. For example, we show that using the transversal gates of the 5-qubit code allows arbitrary rotations around certain axes to be perfectly corrected. Furthermore, we show that Pauli twirling can increase or decrease the threshold depending upon the code properties. Lastly, we show that even if the physical noise model differs slightly from the hypothesized noise model used to determine an optimized decoder, failure rates can still be reduced by applying our hard decoding algorithm.
The Advanced Software Development and Commercialization Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gallopoulos, E.; Canfield, T.R.; Minkoff, M.

1990-09-01

This is the first of a series of reports pertaining to progress in the Advanced Software Development and Commercialization Project, a joint collaborative effort between the Center for Supercomputing Research and Development of the University of Illinois and the Computing and Telecommunications Division of Argonne National Laboratory. The purpose of this work is to apply techniques of parallel computing that were pioneered by University of Illinois researchers to mature computational fluid dynamics (CFD) and structural dynamics (SD) computer codes developed at Argonne. The collaboration in this project will bring this unique combination of expertise to bear, for the first time,more » on industrially important problems. By so doing, it will expose the strengths and weaknesses of existing techniques for parallelizing programs and will identify those problems that need to be solved in order to enable wide spread production use of parallel computers. Secondly, the increased efficiency of the CFD and SD codes themselves will enable the simulation of larger, more accurate engineering models that involve fluid and structural dynamics. In order to realize the above two goals, we are considering two production codes that have been developed at ANL and are widely used by both industry and Universities. These are COMMIX and WHAMS-3D. The first is a computational fluid dynamics code that is used for both nuclear reactor design and safety and as a design tool for the casting industry. The second is a three-dimensional structural dynamics code used in nuclear reactor safety as well as crashworthiness studies. These codes are currently available for both sequential and vector computers only. Our main goal is to port and optimize these two codes on shared memory multiprocessors. In so doing, we shall establish a process that can be followed in optimizing other sequential or vector engineering codes for parallel processors.« less
Performance of a Block Structured, Hierarchical Adaptive MeshRefinement Code on the 64k Node IBM BlueGene/L Computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Greenough, Jeffrey A.; de Supinski, Bronis R.; Yates, Robert K.

2005-04-25

We describe the performance of the block-structured Adaptive Mesh Refinement (AMR) code Raptor on the 32k node IBM BlueGene/L computer. This machine represents a significant step forward towards petascale computing. As such, it presents Raptor with many challenges for utilizing the hardware efficiently. In terms of performance, Raptor shows excellent weak and strong scaling when running in single level mode (no adaptivity). Hardware performance monitors show Raptor achieves an aggregate performance of 3:0 Tflops in the main integration kernel on the 32k system. Results from preliminary AMR runs on a prototype astrophysical problem demonstrate the efficiency of the current softwaremore » when running at large scale. The BG/L system is enabling a physics problem to be considered that represents a factor of 64 increase in overall size compared to the largest ones of this type computed to date. Finally, we provide a description of the development work currently underway to address our inefficiencies.« less
ADAPSO Computer Services Industry Directory of Members, 1972-1973.

ERIC Educational Resources Information Center

Association of Data Processing Service Organizations, New York, NY.

The 1972-73 directory of the Association of Data Processing Service Organizations was designed to provide a list of those members subscribe to the Code of Ethical Standards and can be expected to provide reliable and efficient services to the users in the community. The Code is presented, and then full member firms are listed for states in the…
Fast space-varying convolution using matrix source coding with applications to camera stray light reduction.

PubMed

Wei, Jianing; Bouman, Charles A; Allebach, Jan P

2014-05-01

Many imaging applications require the implementation of space-varying convolution for accurate restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear operators whose impulse response has slow spatial variation. In addition, these space-varying convolution operators are often dense, so direct implementation of the convolution operator is typically computationally impractical. One such example is the problem of stray light reduction in digital cameras, which requires the implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as iterative tomographic reconstruction, can also depend on the implementation of dense space-varying convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier transform, this approach does not work for space-varying operators. So direct convolution is often the only option for implementing space-varying convolution. In this paper, we develop a general approach to the efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the memory required to store it; we also dramatically reduce the computation required to implement matrix-vector products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying convolution operator into a product of sparse transforms. Experimental results show that our method can dramatically reduce the computation required for stray light reduction while maintaining high accuracy.

The DoD's High Performance Computing Modernization Program - Ensuing the National Earth Systems Prediction Capability Becomes Operational

NASA Astrophysics Data System (ADS)

Burnett, W.

2016-12-01

The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.
PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)

NASA Astrophysics Data System (ADS)

Vincenti, Henri

2016-03-01

The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.
Fast methods to numerically integrate the Reynolds equation for gas fluid films

NASA Technical Reports Server (NTRS)

Dimofte, Florin

1992-01-01

The alternating direction implicit (ADI) method is adopted, modified, and applied to the Reynolds equation for thin, gas fluid films. An efficient code is developed to predict both the steady-state and dynamic performance of an aerodynamic journal bearing. An alternative approach is shown for hybrid journal gas bearings by using Liebmann's iterative solution (LIS) for elliptic partial differential equations. The results are compared with known design criteria from experimental data. The developed methods show good accuracy and very short computer running time in comparison with methods based on an inverting of a matrix. The computer codes need a small amount of memory and can be run on either personal computers or on mainframe systems.
Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer

NASA Technical Reports Server (NTRS)

Hornfeck, William A.

1988-01-01

A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.
Local coding based matching kernel method for image classification.

PubMed

Song, Yan; McLoughlin, Ian Vince; Dai, Li-Rong

2014-01-01

This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.
Viterbi decoding for satellite and space communication.

NASA Technical Reports Server (NTRS)

Heller, J. A.; Jacobs, I. M.

1971-01-01

Convolutional coding and Viterbi decoding, along with binary phase-shift keyed modulation, is presented as an efficient system for reliable communication on power limited satellite and space channels. Performance results, obtained theoretically and through computer simulation, are given for optimum short constraint length codes for a range of code constraint lengths and code rates. System efficiency is compared for hard receiver quantization and 4 and 8 level soft quantization. The effects on performance of varying of certain parameters relevant to decoder complexity and cost are examined. Quantitative performance degradation due to imperfect carrier phase coherence is evaluated and compared to that of an uncoded system. As an example of decoder performance versus complexity, a recently implemented 2-Mbit/sec constraint length 7 Viterbi decoder is discussed. Finally a comparison is made between Viterbi and sequential decoding in terms of suitability to various system requirements.
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Analytical determination of propeller performance degradation due to ice accretion

NASA Technical Reports Server (NTRS)

Miller, T. L.

1986-01-01

A computer code has been developed which is capable of computing propeller performance for clean, glaze, or rime iced propeller configurations, thereby providing a mechanism for determining the degree of performance degradation which results from a given icing encounter. The inviscid, incompressible flow field at each specified propeller radial location is first computed using the Theodorsen transformation method of conformal mapping. A droplet trajectory computation then calculates droplet impingement points and airfoil collection efficiency for each radial location, at which point several user-selectable empirical correlations are available for determining the aerodynamic penalities which arise due to the ice accretion. Propeller performance is finally computed using strip analysis for either the clean or iced propeller. In the iced mode, the differential thrust and torque coefficient equations are modified by the drag and lift coefficient increments due to ice to obtain the appropriate iced values. Comparison with available experimental propeller icing data shows good agreement in several cases. The code's capability to properly predict iced thrust coefficient, power coefficient, and propeller efficiency is shown to be dependent on the choice of empirical correlation employed as well as proper specification of radial icing extent.
A critical analysis of the accuracy of several numerical techniques for combustion kinetic rate equations

NASA Technical Reports Server (NTRS)

Radhadrishnan, Krishnan

1993-01-01

A detailed analysis of the accuracy of several techniques recently developed for integrating stiff ordinary differential equations is presented. The techniques include two general-purpose codes EPISODE and LSODE developed for an arbitrary system of ordinary differential equations, and three specialized codes CHEMEQ, CREK1D, and GCKP4 developed specifically to solve chemical kinetic rate equations. The accuracy study is made by application of these codes to two practical combustion kinetics problems. Both problems describe adiabatic, homogeneous, gas-phase chemical reactions at constant pressure, and include all three combustion regimes: induction, heat release, and equilibration. To illustrate the error variation in the different combustion regimes the species are divided into three types (reactants, intermediates, and products), and error versus time plots are presented for each species type and the temperature. These plots show that CHEMEQ is the most accurate code during induction and early heat release. During late heat release and equilibration, however, the other codes are more accurate. A single global quantity, a mean integrated root-mean-square error, that measures the average error incurred in solving the complete problem is used to compare the accuracy of the codes. Among the codes examined, LSODE is the most accurate for solving chemical kinetics problems. It is also the most efficient code, in the sense that it requires the least computational work to attain a specified accuracy level. An important finding is that use of the algebraic enthalpy conservation equation to compute the temperature can be more accurate and efficient than integrating the temperature differential equation.
Development of an efficient computer code to solve the time-dependent Navier-Stokes equations. [for predicting viscous flow fields about lifting bodies

NASA Technical Reports Server (NTRS)

Harp, J. L., Jr.; Oatway, T. P.

1975-01-01

A research effort was conducted with the goal of reducing computer time of a Navier Stokes Computer Code for prediction of viscous flow fields about lifting bodies. A two-dimensional, time-dependent, laminar, transonic computer code (STOKES) was modified to incorporate a non-uniform timestep procedure. The non-uniform time-step requires updating of a zone only as often as required by its own stability criteria or that of its immediate neighbors. In the uniform timestep scheme each zone is updated as often as required by the least stable zone of the finite difference mesh. Because of less frequent update of program variables it was expected that the nonuniform timestep would result in a reduction of execution time by a factor of five to ten. Available funding was exhausted prior to successful demonstration of the benefits to be derived from the non-uniform time-step method.
Dynamic Elasto-Plastic Response of Shells in an Acoustic Medium - Theoretical Development for the EPSA Code

DTIC Science & Technology

1978-07-01

TECHNOLOGY OFFICE OF NAVAL RESEARCH ARLINGTON* VA 22217 ATTN CODE 200 NAVAL. UNDERWATER SYSTEMS COMMAND NEWPORT. RI 02840 ATTN DRo AZRIEL HARARI/ 3 .b 311...ANAOST.FIT THEORETICAL DEVELOPMENT FOR THE EPSA CODE ~/ R/Atkatsh, M.P./Bieniek. -AM M.L.,/aron OFF NAVAL RESEARCH CONTRACT N/ 3 14-72-C-19~. TRACT 7_...the report, both procedures result In a marked increase in computational efficiency, parti- cularly for cases in which large systems are to be
Algorithm 782 : codes for rank-revealing QR factorizations of dense matrices.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, C. H.; Quintana-Orti, G.; Mathematics and Computer Science

1998-06-01

This article describes a suite of codes as well as associated testing and timing drivers for computing rank-revealing QR (RRQR) factorizations of dense matrices. The main contribution is an efficient block algorithm for approximating an RRQR factorization, employing a windowed version of the commonly used Golub pivoting strategy and improved versions of the RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by Pan and Tang, respectively, We highlight usage and features of these codes.
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.

PubMed

Russ, Daniel E; Ho, Kwan-Yuet; Colt, Joanne S; Armenti, Karla R; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P; Karagas, Margaret R; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T; Johnson, Calvin A; Friesen, Melissa C

2016-06-01

Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
GPU implementation of the linear scaling three dimensional fragment method for large scale electronic structure calculations

NASA Astrophysics Data System (ADS)

Jia, Weile; Wang, Jue; Chi, Xuebin; Wang, Lin-Wang

2017-02-01

LS3DF, namely linear scaling three-dimensional fragment method, is an efficient linear scaling ab initio total energy electronic structure calculation code based on a divide-and-conquer strategy. In this paper, we present our GPU implementation of the LS3DF code. Our test results show that the GPU code can calculate systems with about ten thousand atoms fully self-consistently in the order of 10 min using thousands of computing nodes. This makes the electronic structure calculations of 10,000-atom nanosystems routine work. This speed is 4.5-6 times faster than the CPU calculations using the same number of nodes on the Titan machine in the Oak Ridge leadership computing facility (OLCF). Such speedup is achieved by (a) carefully re-designing of the computationally heavy kernels; (b) redesign of the communication pattern for heterogeneous supercomputers.
Study of Two-Dimensional Compressible Non-Acoustic Modeling of Stirling Machine Type Components

NASA Technical Reports Server (NTRS)

Tew, Roy C., Jr.; Ibrahim, Mounir B.

2001-01-01

A two-dimensional (2-D) computer code was developed for modeling enclosed volumes of gas with oscillating boundaries, such as Stirling machine components. An existing 2-D incompressible flow computer code, CAST, was used as the starting point for the project. CAST was modified to use the compressible non-acoustic Navier-Stokes equations to model an enclosed volume including an oscillating piston. The devices modeled have low Mach numbers and are sufficiently small that the time required for acoustics to propagate across them is negligible. Therefore, acoustics were excluded to enable more time efficient computation. Background information about the project is presented. The compressible non-acoustic flow assumptions are discussed. The governing equations used in the model are presented in transport equation format. A brief description is given of the numerical methods used. Comparisons of code predictions with experimental data are then discussed.
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Finite-difference solution of the compressible stability eigenvalue problem

NASA Technical Reports Server (NTRS)

Malik, M. R.

1982-01-01

A compressible stability analysis computer code is developed. The code uses a matrix finite difference method for local eigenvalue solution when a good guess for the eigenvalue is available and is significantly more computationally efficient than the commonly used initial value approach. The local eigenvalue search procedure also results in eigenfunctions and, at little extra work, group velocities. A globally convergent eigenvalue procedure is also developed which may be used when no guess for the eigenvalue is available. The global problem is formulated in such a way that no unstable spurious modes appear so that the method is suitable for use in a black box stability code. Sample stability calculations are presented for the boundary layer profiles of a Laminar Flow Control (LFC) swept wing.
Chemical reacting flows

NASA Astrophysics Data System (ADS)

Lezberg, Erwin A.; Mularz, Edward J.; Liou, Meng-Sing

1991-03-01

The objectives and accomplishments of research in chemical reacting flows, including both experimental and computational problems are described. The experimental research emphasizes the acquisition of reliable reacting-flow data for code validation, the development of chemical kinetics mechanisms, and the understanding of two-phase flow dynamics. Typical results from two nonreacting spray studies are presented. The computational fluid dynamics (CFD) research emphasizes the development of efficient and accurate algorithms and codes, as well as validation of methods and modeling (turbulence and kinetics) for reacting flows. Major developments of the RPLUS code and its application to mixing concepts, the General Electric combustor, and the Government baseline engine for the National Aerospace Plane are detailed. Finally, the turbulence research in the newly established Center for Modeling of Turbulence and Transition (CMOTT) is described.
Numerical studies of the deposition of material released from fixed and rotary wing aircraft

NASA Technical Reports Server (NTRS)

Bilanin, A. J.; Teske, M. E.

1984-01-01

The computer code AGDISP (AGricultural DISPersal) has been developed to predict the deposition of material released from fixed and rotary wing aircraft in a single-pass, computationally efficient manner. The formulation of the code is novel in that the mean particle trajectory and the variance about the mean resulting from turbulent fluid fluctuations are simultaneously predicted. The code presently includes the capability of assessing the influence of neutral atmospheric conditions, inviscid wake vortices, particle evaporation, plant canopy and terrain on the deposition pattern. In this report, the equations governing the motion of aerially released particles are developed, including a description of the evaporation model used. A series of case studies, using AGDISP, are included.
Development and application of GASP 2.0

NASA Technical Reports Server (NTRS)

Mcgrory, W. D.; Huebner, L. D.; Slack, D. C.; Walters, R. W.

1992-01-01

GASP 2.0 represents a major new release of the computational fluid dynamics code in wide use by the aerospace community. The authors have spent the last two years analyzing the strengths and weaknesses of the previous version of the finite-rate chemistry, Navier Stokes solution algorithm. What has resulted is a completely redesigned computer code that offers two to four times the performance of previous versions while requiring as little as one quarter of the memory requirements. In addition to the improvements in efficiency over the original code, Version 2.0 contains many new features. A brief discussion of the improvements made to GASP, and an application using GASP 2.0 which demonstrates some of the new features are presented.

Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

NASA Astrophysics Data System (ADS)

Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

2016-07-01

Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Global magnetosphere simulations using constrained-transport Hall-MHD with CWENO reconstruction

NASA Astrophysics Data System (ADS)

Lin, L.; Germaschewski, K.; Maynard, K. M.; Abbott, S.; Bhattacharjee, A.; Raeder, J.

2013-12-01

We present a new CWENO (Centrally-Weighted Essentially Non-Oscillatory) reconstruction based MHD solver for the OpenGGCM global magnetosphere code. The solver was built using libMRC, a library for creating efficient parallel PDE solvers on structured grids. The use of libMRC gives us access to its core functionality of providing an automated code generation framework which takes a user provided PDE right hand side in symbolic form to generate an efficient, computer architecture specific, parallel code. libMRC also supports block-structured adaptive mesh refinement and implicit-time stepping through integration with the PETSc library. We validate the new CWENO Hall-MHD solver against existing solvers both in standard test problems as well as in global magnetosphere simulations.
Some practical universal noiseless coding techniques, part 2

NASA Technical Reports Server (NTRS)

Rice, R. F.; Lee, J. J.

1983-01-01

This report is an extension of earlier work (Part 1) which provided practical adaptive techniques for the efficient noiseless coding of a broad class of data sources characterized by only partially known and varying statistics (JPL Publication 79-22). The results here, while still claiming such general applicability, focus primarily on the noiseless coding of image data. A fairly complete and self-contained treatment is provided. Particular emphasis is given to the requirements of the forthcoming Voyager II encounters of Uranus and Neptune. Performance evaluations are supported both graphically and pictorially. Expanded definitions of the algorithms in Part 1 yield a computationally improved set of options for applications requiring efficient performance at entropies above 4 bits/sample. These expanded definitions include as an important subset, a somewhat less efficient but extremely simple "FAST' compressor which will be used at the Voyager Uranus encounter. Additionally, options are provided which enhance performance when atypical data spikes may be present.
An efficient and reliable geographic routing protocol based on partial network coding for underwater sensor networks.

PubMed

Hao, Kun; Jin, Zhigang; Shen, Haifeng; Wang, Ying

2015-05-28

Efficient routing protocols for data packet delivery are crucial to underwater sensor networks (UWSNs). However, communication in UWSNs is a challenging task because of the characteristics of the acoustic channel. Network coding is a promising technique for efficient data packet delivery thanks to the broadcast nature of acoustic channels and the relatively high computation capabilities of the sensor nodes. In this work, we present GPNC, a novel geographic routing protocol for UWSNs that incorporates partial network coding to encode data packets and uses sensor nodes' location information to greedily forward data packets to sink nodes. GPNC can effectively reduce network delays and retransmissions of redundant packets causing additional network energy consumption. Simulation results show that GPNC can significantly improve network throughput and packet delivery ratio, while reducing energy consumption and network latency when compared with other routing protocols.
Performance evaluation of the intra compression in the video coding standards

NASA Astrophysics Data System (ADS)

Abramowski, Andrzej

2015-09-01

The article presents a comparison of the Intra prediction algorithms in the current state-of-the-art video coding standards, including MJPEG 2000, VP8, VP9, H.264/AVC and H.265/HEVC. The effectiveness of techniques employed by each standard is evaluated in terms of compression efficiency and average encoding time. The compression efficiency is measured using BD-PSNR and BD-RATE metrics with H.265/HEVC results as an anchor. Tests are performed on a set of video sequences, composed of sequences gathered by Joint Collaborative Team on Video Coding during the development of the H.265/HEVC standard and 4K sequences provided by Ultra Video Group. According to results, H.265/HEVC provides significant bit-rate savings at the expense of computational complexity, while VP9 may be regarded as a compromise between the efficiency and required encoding time.
Recent Developments in the Application of Biologically Inspired Computation to Chemical Sensing

NASA Astrophysics Data System (ADS)

Marco, S.; Gutierrez-Gálvez, A.

2009-05-01

Biological olfaction outperforms chemical instrumentation in specificity, response time, detection limit, coding capacity, time stability, robustness, size, power consumption, and portability. This biological function provides outstanding performance due, to a large extent, to the unique architecture of the olfactory pathway, which combines a high degree of redundancy, an efficient combinatorial coding along with unmatched chemical information processing mechanisms. The last decade has witnessed important advances in the understanding of the computational primitives underlying the functioning of the olfactory system. In this work, the state of the art concerning biologically inspired computation for chemical sensing will be reviewed. Instead of reviewing the whole body of computational neuroscience of olfaction, we restrict this review to the application of models to the processing of real chemical sensor data.
The investigation of tethered satellite system dynamics

NASA Technical Reports Server (NTRS)

Lorenzini, E.

1985-01-01

Progress in tethered satellite system dynamics research is reported. A retrieval rate control law with no angular feedback to investigate the system's dynamic response was studied. The initial conditions for the computer code which simulates the satellite's rotational dynamics were extended to a generic orbit. The model of the satellite thrusters was modified to simulate a pulsed thrust, by making the SKYHOOK integrator suitable for dealing with delta functions without loosing computational efficiency. Tether breaks were simulated with the high resolution computer code SLACK3. Shuttle's maneuvers were tested. The electric potential around a severed conductive tether with insulator, in the case of a tether breakage at 20 km from the Shuttle, was computed. The electrodynamic hazards due to the breakage of the TSS electrodynamic tether in a plasma are evaluated.
Gyrokinetic micro-turbulence simulations on the NERSC 16-way SMP IBM SP computer: experiences and performance results

NASA Astrophysics Data System (ADS)

Ethier, Stephane; Lin, Zhihong

2001-10-01

Earlier this year, the National Energy Research Scientific Computing center (NERSC) took delivery of the second most powerful computer in the world. With its 2,528 processors running at a peak performance of 1.5 GFlops, this IBM SP machine has a theoretical performance of almost 3.8 TFlops. To efficiently harness such computing power in one single code is not an easy task and requires a good knowledge of the computer's architecture. Here we present the steps that we followed to improve our gyrokinetic micro-turbulence code GTC in order to take advantage of the new 16-way shared memory nodes of the NERSC IBM SP. Performance results are shown as well as details about the improved mixed-mode MPI-OpenMP model that we use. The enhancements to the code allowed us to tackle much bigger problem sizes, getting closer to our goal of simulating an ITER-size tokamak with both kinetic ions and electrons.(This work is supported by DOE Contract No. DE-AC02-76CH03073 (PPPL), and in part by the DOE Fusion SciDAC Project.)
Maestro and Castro: Simulation Codes for Astrophysical Flows

NASA Astrophysics Data System (ADS)

Zingale, Michael; Almgren, Ann; Beckner, Vince; Bell, John; Friesen, Brian; Jacobs, Adam; Katz, Maximilian P.; Malone, Christopher; Nonaka, Andrew; Zhang, Weiqun

2017-01-01

Stellar explosions are multiphysics problems—modeling them requires the coordinated input of gravity solvers, reaction networks, radiation transport, and hydrodynamics together with microphysics recipes to describe the physics of matter under extreme conditions. Furthermore, these models involve following a wide range of spatial and temporal scales, which puts tough demands on simulation codes. We developed the codes Maestro and Castro to meet the computational challenges of these problems. Maestro uses a low Mach number formulation of the hydrodynamics to efficiently model convection. Castro solves the fully compressible radiation hydrodynamics equations to capture the explosive phases of stellar phenomena. Both codes are built upon the BoxLib adaptive mesh refinement library, which prepares them for next-generation exascale computers. Common microphysics shared between the codes allows us to transfer a problem from the low Mach number regime in Maestro to the explosive regime in Castro. Importantly, both codes are freely available (https://github.com/BoxLib-Codes). We will describe the design of the codes and some of their science applications, as well as future development directions.Support for development was provided by NSF award AST-1211563 and DOE/Office of Nuclear Physics grant DE-FG02-87ER40317 to Stony Brook and by the Applied Mathematics Program of the DOE Office of Advance Scientific Computing Research under US DOE contract DE-AC02-05CH11231 to LBNL.
Correlated activity supports efficient cortical processing

PubMed Central

Hung, Chou P.; Cui, Ding; Chen, Yueh-peng; Lin, Chia-pei; Levine, Matthew R.

2015-01-01

Visual recognition is a computational challenge that is thought to occur via efficient coding. An important concept is sparseness, a measure of coding efficiency. The prevailing view is that sparseness supports efficiency by minimizing redundancy and correlations in spiking populations. Yet, we recently reported that “choristers”, neurons that behave more similarly (have correlated stimulus preferences and spontaneous coincident spiking), carry more generalizable object information than uncorrelated neurons (“soloists”) in macaque inferior temporal (IT) cortex. The rarity of choristers (as low as 6% of IT neurons) indicates that they were likely missed in previous studies. Here, we report that correlation strength is distinct from sparseness (choristers are not simply broadly tuned neurons), that choristers are located in non-granular output layers, and that correlated activity predicts human visual search efficiency. These counterintuitive results suggest that a redundant correlational structure supports efficient processing and behavior. PMID:25610392
Implementing controlled-unitary operations over the butterfly network

NASA Astrophysics Data System (ADS)

Soeda, Akihito; Kinjo, Yoshiyuki; Turner, Peter S.; Murao, Mio

2014-12-01

We introduce a multiparty quantum computation task over a network in a situation where the capacities of both the quantum and classical communication channels of the network are limited and a bottleneck occurs. Using a resource setting introduced by Hayashi [1], we present an efficient protocol for performing controlled-unitary operations between two input nodes and two output nodes over the butterfly network, one of the most fundamental networks exhibiting the bottleneck problem. This result opens the possibility of developing a theory of quantum network coding for multiparty quantum computation, whereas the conventional network coding only treats multiparty quantum communication.
Computing observables in curved multifield models of inflation—A guide (with code) to the transport method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dias, Mafalda; Seery, David; Frazer, Jonathan, E-mail: m.dias@sussex.ac.uk, E-mail: j.frazer@sussex.ac.uk, E-mail: a.liddle@sussex.ac.uk

We describe how to apply the transport method to compute inflationary observables in a broad range of multiple-field models. The method is efficient and encompasses scenarios with curved field-space metrics, violations of slow-roll conditions and turns of the trajectory in field space. It can be used for an arbitrary mass spectrum, including massive modes and models with quasi-single-field dynamics. In this note we focus on practical issues. It is accompanied by a Mathematica code which can be used to explore suitable models, or as a basis for further development.
Implementing controlled-unitary operations over the butterfly network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soeda, Akihito; Kinjo, Yoshiyuki; Turner, Peter S.

2014-12-04

We introduce a multiparty quantum computation task over a network in a situation where the capacities of both the quantum and classical communication channels of the network are limited and a bottleneck occurs. Using a resource setting introduced by Hayashi [1], we present an efficient protocol for performing controlled-unitary operations between two input nodes and two output nodes over the butterfly network, one of the most fundamental networks exhibiting the bottleneck problem. This result opens the possibility of developing a theory of quantum network coding for multiparty quantum computation, whereas the conventional network coding only treats multiparty quantum communication.
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

NASA Technical Reports Server (NTRS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Linear-Algebra Programs

NASA Technical Reports Server (NTRS)

Lawson, C. L.; Krogh, F. T.; Gold, S. S.; Kincaid, D. R.; Sullivan, J.; Williams, E.; Hanson, R. J.; Haskell, K.; Dongarra, J.; Moler, C. B.

1982-01-01

The Basic Linear Algebra Subprograms (BLAS) library is a collection of 38 FORTRAN-callable routines for performing basic operations of numerical linear algebra. BLAS library is portable and efficient source of basic operations for designers of programs involving linear algebriac computations. BLAS library is supplied in portable FORTRAN and Assembler code versions for IBM 370, UNIVAC 1100 and CDC 6000 series computers.
Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG

NASA Astrophysics Data System (ADS)

Rohe, Daniel

2016-10-01

The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.
Moment method analysis of linearly tapered slot antennas: Low loss components for switched beam radiometers

NASA Technical Reports Server (NTRS)

Koeksal, Adnan; Trew, Robert J.; Kauffman, J. Frank

1992-01-01

A Moment Method Model for the radiation pattern characterization of single Linearly Tapered Slot Antennas (LTSA) in air or on a dielectric substrate is developed. This characterization consists of: (1) finding the radiated far-fields of the antenna; (2) determining the E-Plane and H-Plane beamwidths and sidelobe levels; and (3) determining the D-Plane beamwidth and cross polarization levels, as antenna parameters length, height, taper angle, substrate thickness, and the relative substrate permittivity vary. The LTSA geometry does not lend itself to analytical solution with the given parameter ranges. Therefore, a computer modeling scheme and a code are necessary to analyze the problem. This necessity imposes some further objectives or requirements on the solution method (modeling) and tool (computer code). These may be listed as follows: (1) a good approximation to the real antenna geometry; and (2) feasible computer storage and time requirements. According to these requirements, the work is concentrated on the development of efficient modeling schemes for these type of problems and on reducing the central processing unit (CPU) time required from the computer code. A Method of Moments (MoM) code is developed for the analysis of LTSA's within the parameter ranges given.
Code CUGEL: A code to unfold Ge(Li) spectrometer polyenergetic gamma photon experimental distributions

NASA Technical Reports Server (NTRS)

Steyn, J. J.; Born, U.

1970-01-01

A FORTRAN code was developed for the Univac 1108 digital computer to unfold lithium-drifted germanium semiconductor spectrometers, polyenergetic gamma photon experimental distributions. It was designed to analyze the combination continuous and monoenergetic gamma radiation field of radioisotope volumetric sources. The code generates the detector system response matrix function and applies it to monoenergetic spectral components discretely and to the continuum iteratively. It corrects for system drift, source decay, background, and detection efficiency. Results are presented in digital form for differential and integrated photon number and energy distributions, and for exposure dose.

Complementary Reliability-Based Decodings of Binary Linear Block Codes

NASA Technical Reports Server (NTRS)

Fossorier, Marc P. C.; Lin, Shu

1997-01-01

This correspondence presents a hybrid reliability-based decoding algorithm which combines the reprocessing method based on the most reliable basis and a generalized Chase-type algebraic decoder based on the least reliable positions. It is shown that reprocessing with a simple additional algebraic decoding effort achieves significant coding gain. For long codes, the order of reprocessing required to achieve asymptotic optimum error performance is reduced by approximately 1/3. This significantly reduces the computational complexity, especially for long codes. Also, a more efficient criterion for stopping the decoding process is derived based on the knowledge of the algebraic decoding solution.
Numerical convergence improvements for porflow unsaturated flow simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Flach, Greg

2017-08-14

Section 3.6 of SRNL (2016) discusses various PORFLOW code improvements to increase modeling efficiency, in preparation for the next E-Area Performance Assessment (WSRC 2008) revision. This memorandum documents interaction with Analytic & Computational Research, Inc. (http://www.acricfd.com/default.htm) to improve numerical convergence efficiency using PORFLOW version 6.42 for unsaturated flow simulations.
Reduced-Order Models for the Aeroelastic Analysis of Ares Launch Vehicles

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Vatsa, Veer N.; Biedron, Robert T.

2010-01-01

This document presents the development and application of unsteady aerodynamic, structural dynamic, and aeroelastic reduced-order models (ROMs) for the ascent aeroelastic analysis of the Ares I-X flight test and Ares I crew launch vehicles using the unstructured-grid, aeroelastic FUN3D computational fluid dynamics (CFD) code. The purpose of this work is to perform computationally-efficient aeroelastic response calculations that would be prohibitively expensive via computation of multiple full-order aeroelastic FUN3D solutions. These efficient aeroelastic ROM solutions provide valuable insight regarding the aeroelastic sensitivity of the vehicles to various parameters over a range of dynamic pressures.
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
An accurate and efficient laser-envelope solver for the modeling of laser-plasma accelerators

DOE PAGES

Benedetti, C.; Schroeder, C. B.; Geddes, C. G. R.; ...

2017-10-17

Detailed and reliable numerical modeling of laser-plasma accelerators (LPAs), where a short and intense laser pulse interacts with an underdense plasma over distances of up to a meter, is a formidably challenging task. This is due to the great disparity among the length scales involved in the modeling, ranging from the micron scale of the laser wavelength to the meter scale of the total laser-plasma interaction length. The use of the time-averaged ponderomotive force approximation, where the laser pulse is described by means of its envelope, enables efficient modeling of LPAs by removing the need to model the details ofmore » electron motion at the laser wavelength scale. Furthermore, it allows simulations in cylindrical geometry which captures relevant 3D physics at 2D computational cost. A key element of any code based on the time-averaged ponderomotive force approximation is the laser envelope solver. In this paper we present the accurate and efficient envelope solver used in the code INF & RNO (INtegrated Fluid & paRticle simulatioN cOde). The features of the INF & RNO laser solver enable an accurate description of the laser pulse evolution deep into depletion even at a reasonably low resolution, resulting in significant computational speed-ups.« less
An accurate and efficient laser-envelope solver for the modeling of laser-plasma accelerators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Benedetti, C.; Schroeder, C. B.; Geddes, C. G. R.

Detailed and reliable numerical modeling of laser-plasma accelerators (LPAs), where a short and intense laser pulse interacts with an underdense plasma over distances of up to a meter, is a formidably challenging task. This is due to the great disparity among the length scales involved in the modeling, ranging from the micron scale of the laser wavelength to the meter scale of the total laser-plasma interaction length. The use of the time-averaged ponderomotive force approximation, where the laser pulse is described by means of its envelope, enables efficient modeling of LPAs by removing the need to model the details ofmore » electron motion at the laser wavelength scale. Furthermore, it allows simulations in cylindrical geometry which captures relevant 3D physics at 2D computational cost. A key element of any code based on the time-averaged ponderomotive force approximation is the laser envelope solver. In this paper we present the accurate and efficient envelope solver used in the code INF & RNO (INtegrated Fluid & paRticle simulatioN cOde). The features of the INF & RNO laser solver enable an accurate description of the laser pulse evolution deep into depletion even at a reasonably low resolution, resulting in significant computational speed-ups.« less
An accurate and efficient laser-envelope solver for the modeling of laser-plasma accelerators

NASA Astrophysics Data System (ADS)

Benedetti, C.; Schroeder, C. B.; Geddes, C. G. R.; Esarey, E.; Leemans, W. P.

2018-01-01

Detailed and reliable numerical modeling of laser-plasma accelerators (LPAs), where a short and intense laser pulse interacts with an underdense plasma over distances of up to a meter, is a formidably challenging task. This is due to the great disparity among the length scales involved in the modeling, ranging from the micron scale of the laser wavelength to the meter scale of the total laser-plasma interaction length. The use of the time-averaged ponderomotive force approximation, where the laser pulse is described by means of its envelope, enables efficient modeling of LPAs by removing the need to model the details of electron motion at the laser wavelength scale. Furthermore, it allows simulations in cylindrical geometry which captures relevant 3D physics at 2D computational cost. A key element of any code based on the time-averaged ponderomotive force approximation is the laser envelope solver. In this paper we present the accurate and efficient envelope solver used in the code INF&RNO (INtegrated Fluid & paRticle simulatioN cOde). The features of the INF&RNO laser solver enable an accurate description of the laser pulse evolution deep into depletion even at a reasonably low resolution, resulting in significant computational speed-ups.
GW/Bethe-Salpeter calculations for charged and model systems from real-space DFT

NASA Astrophysics Data System (ADS)

Strubbe, David A.

GW and Bethe-Salpeter (GW/BSE) calculations use mean-field input from density-functional theory (DFT) calculations to compute excited states of a condensed-matter system. Many parts of a GW/BSE calculation are efficiently performed in a plane-wave basis, and extensive effort has gone into optimizing and parallelizing plane-wave GW/BSE codes for large-scale computations. Most straightforwardly, plane-wave DFT can be used as a starting point, but real-space DFT is also an attractive starting point: it is systematically convergeable like plane waves, can take advantage of efficient domain parallelization for large systems, and is well suited physically for finite and especially charged systems. The flexibility of a real-space grid also allows convenient calculations on non-atomic model systems. I will discuss the interfacing of a real-space (TD)DFT code (Octopus, www.tddft.org/programs/octopus) with a plane-wave GW/BSE code (BerkeleyGW, www.berkeleygw.org), consider performance issues and accuracy, and present some applications to simple and paradigmatic systems that illuminate fundamental properties of these approximations in many-body perturbation theory.
GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods

PubMed Central

Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.

2011-01-01

Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
Description and use of LSODE, the Livermore Solver for Ordinary Differential Equations

NASA Technical Reports Server (NTRS)

Radhakrishnan, Krishnan; Hindmarsh, Alan C.

1993-01-01

LSODE, the Livermore Solver for Ordinary Differential Equations, is a package of FORTRAN subroutines designed for the numerical solution of the initial value problem for a system of ordinary differential equations. It is particularly well suited for 'stiff' differential systems, for which the backward differentiation formula method of orders 1 to 5 is provided. The code includes the Adams-Moulton method of orders 1 to 12, so it can be used for nonstiff problems as well. In addition, the user can easily switch methods to increase computational efficiency for problems that change character. For both methods a variety of corrector iteration techniques is included in the code. Also, to minimize computational work, both the step size and method order are varied dynamically. This report presents complete descriptions of the code and integration methods, including their implementation. It also provides a detailed guide to the use of the code, as well as an illustrative example problem.
A 2D electrostatic PIC code for the Mark III Hypercube

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferraro, R.D.; Liewer, P.C.; Decyk, V.K.

We have implemented a 2D electrostastic plasma particle in cell (PIC) simulation code on the Caltech/JPL Mark IIIfp Hypercube. The code simulates plasma effects by evolving in time the trajectories of thousands to millions of charged particles subject to their self-consistent fields. Each particle`s position and velocity is advanced in time using a leap frog method for integrating Newton`s equations of motion in electric and magnetic fields. The electric field due to these moving charged particles is calculated on a spatial grid at each time by solving Poisson`s equation in Fourier space. These two tasks represent the largest part ofmore » the computation. To obtain efficient operation on a distributed memory parallel computer, we are using the General Concurrent PIC (GCPIC) algorithm previously developed for a 1D parallel PIC code.« less
Sub-Selective Quantization for Learning Binary Codes in Large-Scale Image Search.

PubMed

Li, Yeqing; Liu, Wei; Huang, Junzhou

2018-06-01

Recently with the explosive growth of visual content on the Internet, large-scale image search has attracted intensive attention. It has been shown that mapping high-dimensional image descriptors to compact binary codes can lead to considerable efficiency gains in both storage and performing similarity computation of images. However, most existing methods still suffer from expensive training devoted to large-scale binary code learning. To address this issue, we propose a sub-selection based matrix manipulation algorithm, which can significantly reduce the computational cost of code learning. As case studies, we apply the sub-selection algorithm to several popular quantization techniques including cases using linear and nonlinear mappings. Crucially, we can justify the resulting sub-selective quantization by proving its theoretic properties. Extensive experiments are carried out on three image benchmarks with up to one million samples, corroborating the efficacy of the sub-selective quantization method in terms of image retrieval.
Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Overman, Andrea L.; Lambiotte, Jules J.; Streett, Craig L.

1991-01-01

The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs.
BINGO: a code for the efficient computation of the scalar bi-spectrum

NASA Astrophysics Data System (ADS)

Hazra, Dhiraj Kumar; Sriramkumar, L.; Martin, Jérôme

2013-05-01

We present a new and accurate Fortran code, the BI-spectra and Non-Gaussianity Operator (BINGO), for the efficient numerical computation of the scalar bi-spectrum and the non-Gaussianity parameter fNL in single field inflationary models involving the canonical scalar field. The code can calculate all the different contributions to the bi-spectrum and the parameter fNL for an arbitrary triangular configuration of the wavevectors. Focusing firstly on the equilateral limit, we illustrate the accuracy of BINGO by comparing the results from the code with the spectral dependence of the bi-spectrum expected in power law inflation. Then, considering an arbitrary triangular configuration, we contrast the numerical results with the analytical expression available in the slow roll limit, for, say, the case of the conventional quadratic potential. Considering a non-trivial scenario involving deviations from slow roll, we compare the results from the code with the analytical results that have recently been obtained in the case of the Starobinsky model in the equilateral limit. As an immediate application, we utilize BINGO to examine of the power of the non-Gaussianity parameter fNL to discriminate between various inflationary models that admit departures from slow roll and lead to similar features in the scalar power spectrum. We close with a summary and discussion on the implications of the results we obtain.
Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.

PubMed

Travnik, Jaden B; Pilarski, Patrick M

2017-07-01

Prosthetic devices have advanced in their capabilities and in the number and type of sensors included in their design. As the space of sensorimotor data available to a conventional or machine learning prosthetic control system increases in dimensionality and complexity, it becomes increasingly important that this data be represented in a useful and computationally efficient way. Well structured sensory data allows prosthetic control systems to make informed, appropriate control decisions. In this study, we explore the impact that increased sensorimotor information has on current machine learning prosthetic control approaches. Specifically, we examine the effect that high-dimensional sensory data has on the computation time and prediction performance of a true-online temporal-difference learning prediction method as embedded within a resource-limited upper-limb prosthesis control system. We present results comparing tile coding, the dominant linear representation for real-time prosthetic machine learning, with a newly proposed modification to Kanerva coding that we call selective Kanerva coding. In addition to showing promising results for selective Kanerva coding, our results confirm potential limitations to tile coding as the number of sensory input dimensions increases. To our knowledge, this study is the first to explicitly examine representations for realtime machine learning prosthetic devices in general terms. This work therefore provides an important step towards forming an efficient prosthesis-eye view of the world, wherein prompt and accurate representations of high-dimensional data may be provided to machine learning control systems within artificial limbs and other assistive rehabilitation technologies.
Groundwater flow and heat transport for systems undergoing freeze-thaw: Intercomparison of numerical simulators for 2D test cases

NASA Astrophysics Data System (ADS)

Grenier, Christophe; Anbergen, Hauke; Bense, Victor; Chanzy, Quentin; Coon, Ethan; Collier, Nathaniel; Costard, François; Ferry, Michel; Frampton, Andrew; Frederick, Jennifer; Gonçalvès, Julio; Holmén, Johann; Jost, Anne; Kokh, Samuel; Kurylyk, Barret; McKenzie, Jeffrey; Molson, John; Mouche, Emmanuel; Orgogozo, Laurent; Pannetier, Romain; Rivière, Agnès; Roux, Nicolas; Rühaak, Wolfram; Scheidegger, Johanna; Selroos, Jan-Olof; Therrien, René; Vidstrand, Patrik; Voss, Clifford

2018-04-01

In high-elevation, boreal and arctic regions, hydrological processes and associated water bodies can be strongly influenced by the distribution of permafrost. Recent field and modeling studies indicate that a fully-coupled multidimensional thermo-hydraulic approach is required to accurately model the evolution of these permafrost-impacted landscapes and groundwater systems. However, the relatively new and complex numerical codes being developed for coupled non-linear freeze-thaw systems require verification. This issue is addressed by means of an intercomparison of thirteen numerical codes for two-dimensional test cases with several performance metrics (PMs). These codes comprise a wide range of numerical approaches, spatial and temporal discretization strategies, and computational efficiencies. Results suggest that the codes provide robust results for the test cases considered and that minor discrepancies are explained by computational precision. However, larger discrepancies are observed for some PMs resulting from differences in the governing equations, discretization issues, or in the freezing curve used by some codes.
Data Assimilation - Advances and Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Brian J.

2014-07-30

This presentation provides an overview of data assimilation (model calibration) for complex computer experiments. Calibration refers to the process of probabilistically constraining uncertain physics/engineering model inputs to be consistent with observed experimental data. An initial probability distribution for these parameters is updated using the experimental information. Utilization of surrogate models and empirical adjustment for model form error in code calibration form the basis for the statistical methodology considered. The role of probabilistic code calibration in supporting code validation is discussed. Incorporation of model form uncertainty in rigorous uncertainty quantification (UQ) analyses is also addressed. Design criteria used within a batchmore » sequential design algorithm are introduced for efficiently achieving predictive maturity and improved code calibration. Predictive maturity refers to obtaining stable predictive inference with calibrated computer codes. These approaches allow for augmentation of initial experiment designs for collecting new physical data. A standard framework for data assimilation is presented and techniques for updating the posterior distribution of the state variables based on particle filtering and the ensemble Kalman filter are introduced.« less
ANNarchy: a code generation approach to neural simulations on parallel hardware

PubMed Central

Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

2015-01-01

Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

An Assessment of Artificial Compressibility and Pressure Projection Methods for Incompressible Flow Simulations

NASA Technical Reports Server (NTRS)

Kwak, Dochan; Kiris, C.; Smith, Charles A. (Technical Monitor)

1998-01-01

Performance of the two commonly used numerical procedures, one based on artificial compressibility method and the other pressure projection method, are compared. These formulations are selected primarily because they are designed for three-dimensional applications. The computational procedures are compared by obtaining steady state solutions of a wake vortex and unsteady solutions of a curved duct flow. For steady computations, artificial compressibility was very efficient in terms of computing time and robustness. For an unsteady flow which requires small physical time step, pressure projection method was found to be computationally more efficient than an artificial compressibility method. This comparison is intended to give some basis for selecting a method or a flow solution code for large three-dimensional applications where computing resources become a critical issue.
A solid reactor core thermal model for nuclear thermal rockets

NASA Astrophysics Data System (ADS)

Rider, William J.; Cappiello, Michael W.; Liles, Dennis R.

1991-01-01

A Helium/Hydrogen Cooled Reactor Analysis (HERA) computer code has been developed. HERA has the ability to model arbitrary geometries in three dimensions, which allows the user to easily analyze reactor cores constructed of prismatic graphite elements. The code accounts for heat generation in the fuel, control rods, and other structures; conduction and radiation across gaps; convection to the coolant; and a variety of boundary conditions. The numerical solution scheme has been optimized for vector computers, making long transient analyses economical. Time integration is either explicit or implicit, which allows the use of the model to accurately calculate both short- or long-term transients with an efficient use of computer time. Both the basic spatial and temporal integration schemes have been benchmarked against analytical solutions.
Unsteady three-dimensional thermal field prediction in turbine blades using nonlinear BEM

NASA Technical Reports Server (NTRS)

Martin, Thomas J.; Dulikravich, George S.

1993-01-01

A time-and-space accurate and computationally efficient fully three dimensional unsteady temperature field analysis computer code has been developed for truly arbitrary configurations. It uses boundary element method (BEM) formulation based on an unsteady Green's function approach, multi-point Gaussian quadrature spatial integration on each panel, and a highly clustered time-step integration. The code accepts either temperatures or heat fluxes as boundary conditions that can vary in time on a point-by-point basis. Comparisons of the BEM numerical results and known analytical unsteady results for simple shapes demonstrate very high accuracy and reliability of the algorithm. An example of computed three dimensional temperature and heat flux fields in a realistically shaped internally cooled turbine blade is also discussed.
Supersonic nonlinear potential analysis

NASA Technical Reports Server (NTRS)

Siclari, M. J.

1984-01-01

The NCOREL computer code was established to compute supersonic flow fields of wings and bodies. The method encompasses an implicit finite difference transonic relaxation method to solve the full potential equation in a spherical coordinate system. Two basic topic to broaden the applicability and usefulness of the present method which is encompassed within the computer code NCOREL for the treatment of supersonic flow problems were studied. The first topic is that of computing efficiency. Accelerated schemes are in use for transonic flow problems. One such scheme is the approximate factorization (AF) method and an AF scheme to the supersonic flow problem is developed. The second topic is the computation of wake flows. The proper modeling of wake flows is important for multicomponent configurations such as wing-body and multiple lifting surfaces where the wake of one lifting surface has a pronounced effect on a downstream body or other lifting surfaces.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

NASA Technical Reports Server (NTRS)

Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.

2000-01-01

The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
An efficient interpolation filter VLSI architecture for HEVC standard

NASA Astrophysics Data System (ADS)

Zhou, Wei; Zhou, Xin; Lian, Xiaocong; Liu, Zhenyu; Liu, Xiaoxiang

2015-12-01

The next-generation video coding standard of High-Efficiency Video Coding (HEVC) is especially efficient for coding high-resolution video such as 8K-ultra-high-definition (UHD) video. Fractional motion estimation in HEVC presents a significant challenge in clock latency and area cost as it consumes more than 40 % of the total encoding time and thus results in high computational complexity. With aims at supporting 8K-UHD video applications, an efficient interpolation filter VLSI architecture for HEVC is proposed in this paper. Firstly, a new interpolation filter algorithm based on the 8-pixel interpolation unit is proposed in this paper. It can save 19.7 % processing time on average with acceptable coding quality degradation. Based on the proposed algorithm, an efficient interpolation filter VLSI architecture, composed of a reused data path of interpolation, an efficient memory organization, and a reconfigurable pipeline interpolation filter engine, is presented to reduce the implement hardware area and achieve high throughput. The final VLSI implementation only requires 37.2k gates in a standard 90-nm CMOS technology at an operating frequency of 240 MHz. The proposed architecture can be reused for either half-pixel interpolation or quarter-pixel interpolation, which can reduce the area cost for about 131,040 bits RAM. The processing latency of our proposed VLSI architecture can support the real-time processing of 4:2:0 format 7680 × 4320@78fps video sequences.
Large-Scale Computation of Nuclear Magnetic Resonance Shifts for Paramagnetic Solids Using CP2K.

PubMed

Mondal, Arobendo; Gaultois, Michael W; Pell, Andrew J; Iannuzzi, Marcella; Grey, Clare P; Hutter, Jürg; Kaupp, Martin

2018-01-09

Large-scale computations of nuclear magnetic resonance (NMR) shifts for extended paramagnetic solids (pNMR) are reported using the highly efficient Gaussian-augmented plane-wave implementation of the CP2K code. Combining hyperfine couplings obtained with hybrid functionals with g-tensors and orbital shieldings computed using gradient-corrected functionals, contact, pseudocontact, and orbital-shift contributions to pNMR shifts are accessible. Due to the efficient and highly parallel performance of CP2K, a wide variety of materials with large unit cells can be studied with extended Gaussian basis sets. Validation of various approaches for the different contributions to pNMR shifts is done first for molecules in a large supercell in comparison with typical quantum-chemical codes. This is then extended to a detailed study of g-tensors for extended solid transition-metal fluorides and for a series of complex lithium vanadium phosphates. Finally, lithium pNMR shifts are computed for Li 3 V 2 (PO 4 ) 3 , for which detailed experimental data are available. This has allowed an in-depth study of different approaches (e.g., full periodic versus incremental cluster computations of g-tensors and different functionals and basis sets for hyperfine computations) as well as a thorough analysis of the different contributions to the pNMR shifts. This study paves the way for a more-widespread computational treatment of NMR shifts for paramagnetic materials.
Research in Computational Astrobiology

NASA Technical Reports Server (NTRS)

Chaban, Galina; Colombano, Silvano; Scargle, Jeff; New, Michael H.; Pohorille, Andrew; Wilson, Michael A.

2003-01-01

We report on several projects in the field of computational astrobiology, which is devoted to advancing our understanding of the origin, evolution and distribution of life in the Universe using theoretical and computational tools. Research projects included modifying existing computer simulation codes to use efficient, multiple time step algorithms, statistical methods for analysis of astrophysical data via optimal partitioning methods, electronic structure calculations on water-nuclei acid complexes, incorporation of structural information into genomic sequence analysis methods and calculations of shock-induced formation of polycylic aromatic hydrocarbon compounds.
Efficient preparation of large-block-code ancilla states for fault-tolerant quantum computation

NASA Astrophysics Data System (ADS)

Zheng, Yi-Cong; Lai, Ching-Yi; Brun, Todd A.

2018-03-01

Fault-tolerant quantum computation (FTQC) schemes that use multiqubit large block codes can potentially reduce the resource overhead to a great extent. A major obstacle is the requirement for a large number of clean ancilla states of different types without correlated errors inside each block. These ancilla states are usually logical stabilizer states of the data-code blocks, which are generally difficult to prepare if the code size is large. Previously, we have proposed an ancilla distillation protocol for Calderbank-Shor-Steane (CSS) codes by classical error-correcting codes. It was assumed that the quantum gates in the distillation circuit were perfect; however, in reality, noisy quantum gates may introduce correlated errors that are not treatable by the protocol. In this paper, we show that additional postselection by another classical error-detecting code can be applied to remove almost all correlated errors. Consequently, the revised protocol is fully fault tolerant and capable of preparing a large set of stabilizer states sufficient for FTQC using large block codes. At the same time, the yield rate can be boosted from O (t-2) to O (1 ) in practice for an [[n ,k ,d =2 t +1
Space coding for sensorimotor transformations can emerge through unsupervised learning.

PubMed

De Filippo De Grazia, Michele; Cutini, Simone; Lisi, Matteo; Zorzi, Marco

2012-08-01

The posterior parietal cortex (PPC) is fundamental for sensorimotor transformations because it combines multiple sensory inputs and posture signals into different spatial reference frames that drive motor programming. Here, we present a computational model mimicking the sensorimotor transformations occurring in the PPC. A recurrent neural network with one layer of hidden neurons (restricted Boltzmann machine) learned a stochastic generative model of the sensory data without supervision. After the unsupervised learning phase, the activity of the hidden neurons was used to compute a motor program (a population code on a bidimensional map) through a simple linear projection and delta rule learning. The average motor error, calculated as the difference between the expected and the computed output, was less than 3°. Importantly, analyses of the hidden neurons revealed gain-modulated visual receptive fields, thereby showing that space coding for sensorimotor transformations similar to that observed in the PPC can emerge through unsupervised learning. These results suggest that gain modulation is an efficient coding strategy to integrate visual and postural information toward the generation of motor commands.
NASA Lewis Stirling SPRE testing and analysis with reduced number of cooler tubes

NASA Technical Reports Server (NTRS)

Wong, Wayne A.; Cairelli, James E.; Swec, Diane M.; Doeberling, Thomas J.; Lakatos, Thomas F.; Madi, Frank J.

1992-01-01

Free-piston Stirling power converters are candidates for high capacity space power applications. The Space Power Research Engine (SPRE), a free-piston Stirling engine coupled with a linear alternator, is being tested at the NASA Lewis Research Center in support of the Civil Space Technology Initiative. The SPRE is used as a test bed for evaluating converter modifications which have the potential to improve the converter performance and for validating computer code predictions. Reducing the number of cooler tubes on the SPRE has been identified as a modification with the potential to significantly improve power and efficiency. Experimental tests designed to investigate the effects of reducing the number of cooler tubes on converter power, efficiency and dynamics are described. Presented are test results from the converter operating with a reduced number of cooler tubes and comparisons between this data and both baseline test data and computer code predictions.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Method and system for efficient video compression with low-complexity encoder

NASA Technical Reports Server (NTRS)

Chen, Jun (Inventor); He, Dake (Inventor); Sheinin, Vadim (Inventor); Jagmohan, Ashish (Inventor); Lu, Ligang (Inventor)

2012-01-01

Disclosed are a method and system for video compression, wherein the video encoder has low computational complexity and high compression efficiency. The disclosed system comprises a video encoder and a video decoder, wherein the method for encoding includes the steps of converting a source frame into a space-frequency representation; estimating conditional statistics of at least one vector of space-frequency coefficients; estimating encoding rates based on the said conditional statistics; and applying Slepian-Wolf codes with the said computed encoding rates. The preferred method for decoding includes the steps of; generating a side-information vector of frequency coefficients based on previously decoded source data, encoder statistics, and previous reconstructions of the source frequency vector; and performing Slepian-Wolf decoding of at least one source frequency vector based on the generated side-information, the Slepian-Wolf code bits and the encoder statistics.
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Efficient modeling of laser-plasma accelerator staging experiments using INF&RNO

NASA Astrophysics Data System (ADS)

Benedetti, C.; Schroeder, C. B.; Geddes, C. G. R.; Esarey, E.; Leemans, W. P.

2017-03-01

The computational framework INF&RNO (INtegrated Fluid & paRticle simulatioN cOde) allows for fast and accurate modeling, in 2D cylindrical geometry, of several aspects of laser-plasma accelerator physics. In this paper, we present some of the new features of the code, including the quasistatic Particle-In-Cell (PIC)/fluid modality, and describe using different computational grids and time steps for the laser envelope and the plasma wake. These and other features allow for a speedup of several orders of magnitude compared to standard full 3D PIC simulations while still retaining physical fidelity. INF&RNO is used to support the experimental activity at the BELLA Center, and we will present an example of the application of the code to the laser-plasma accelerator staging experiment.
Development and Demonstration of a Computational Tool for the Analysis of Particle Vitiation Effects in Hypersonic Propulsion Test Facilities

NASA Technical Reports Server (NTRS)

Perkins, Hugh Douglas

2010-01-01

In order to improve the understanding of particle vitiation effects in hypersonic propulsion test facilities, a quasi-one dimensional numerical tool was developed to efficiently model reacting particle-gas flows over a wide range of conditions. Features of this code include gas-phase finite-rate kinetics, a global porous-particle combustion model, mass, momentum and energy interactions between phases, and subsonic and supersonic particle drag and heat transfer models. The basic capabilities of this tool were validated against available data or other validated codes. To demonstrate the capabilities of the code a series of computations were performed for a model hypersonic propulsion test facility and scramjet. Parameters studied were simulated flight Mach number, particle size, particle mass fraction and particle material.
Multiprocessing on supercomputers for computational aerodynamics

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Mehta, Unmeel B.

1990-01-01

Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
Multi-phase SPH modelling of violent hydrodynamics on GPUs

NASA Astrophysics Data System (ADS)

Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.

2015-11-01

This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.
SCISEAL: A CFD Code for Analysis of Fluid Dynamic Forces in Seals

NASA Technical Reports Server (NTRS)

Althavale, Mahesh M.; Ho, Yin-Hsing; Przekwas, Andre J.

1996-01-01

A 3D CFD code, SCISEAL, has been developed and validated. Its capabilities include cylindrical seals, and it is employed on labyrinth seals, rim seals, and disc cavities. State-of-the-art numerical methods include colocated grids, high-order differencing, and turbulence models which account for wall roughness. SCISEAL computes efficient solutions for complicated flow geometries and seal-specific capabilities (rotor loads, torques, etc.).
Fast bi-directional prediction selection in H.264/MPEG-4 AVC temporal scalable video coding.

PubMed

Lin, Hung-Chih; Hang, Hsueh-Ming; Peng, Wen-Hsiao

2011-12-01

In this paper, we propose a fast algorithm that efficiently selects the temporal prediction type for the dyadic hierarchical-B prediction structure in the H.264/MPEG-4 temporal scalable video coding (SVC). We make use of the strong correlations in prediction type inheritance to eliminate the superfluous computations for the bi-directional (BI) prediction in the finer partitions, 16×8/8×16/8×8 , by referring to the best temporal prediction type of 16 × 16. In addition, we carefully examine the relationship in motion bit-rate costs and distortions between the BI and the uni-directional temporal prediction types. As a result, we construct a set of adaptive thresholds to remove the unnecessary BI calculations. Moreover, for the block partitions smaller than 8 × 8, either the forward prediction (FW) or the backward prediction (BW) is skipped based upon the information of their 8 × 8 partitions. Hence, the proposed schemes can efficiently reduce the extensive computational burden in calculating the BI prediction. As compared to the JSVM 9.11 software, our method saves the encoding time from 48% to 67% for a large variety of test videos over a wide range of coding bit-rates and has only a minor coding performance loss. © 2011 IEEE

LSENS: A General Chemical Kinetics and Sensitivity Analysis Code for homogeneous gas-phase reactions. Part 1: Theory and numerical solution procedures

NASA Technical Reports Server (NTRS)

Radhakrishnan, Krishnan

1994-01-01

LSENS, the Lewis General Chemical Kinetics and Sensitivity Analysis Code, has been developed for solving complex, homogeneous, gas-phase chemical kinetics problems and contains sensitivity analysis for a variety of problems, including nonisothermal situations. This report is part 1 of a series of three reference publications that describe LENS, provide a detailed guide to its usage, and present many example problems. Part 1 derives the governing equations and describes the numerical solution procedures for the types of problems that can be solved. The accuracy and efficiency of LSENS are examined by means of various test problems, and comparisons with other methods and codes are presented. LSENS is a flexible, convenient, accurate, and efficient solver for chemical reaction problems such as static system; steady, one-dimensional, inviscid flow; reaction behind incident shock wave, including boundary layer correction; and perfectly stirred (highly backmixed) reactor. In addition, the chemical equilibrium state can be computed for the following assigned states: temperature and pressure, enthalpy and pressure, temperature and volume, and internal energy and volume. For static problems the code computes the sensitivity coefficients of the dependent variables and their temporal derivatives with respect to the initial values of the dependent variables and/or the three rate coefficient parameters of the chemical reactions.
Transversal Clifford gates on folded surface codes

DOE PAGES

Moussa, Jonathan E.

2016-10-12

Surface and color codes are two forms of topological quantum error correction in two spatial dimensions with complementary properties. Surface codes have lower-depth error detection circuits and well-developed decoders to interpret and correct errors, while color codes have transversal Clifford gates and better code efficiency in the number of physical qubits needed to achieve a given code distance. A formal equivalence exists between color codes and folded surface codes, but it does not guarantee the transferability of any of these favorable properties. However, the equivalence does imply the existence of constant-depth circuit implementations of logical Clifford gates on folded surfacemore » codes. We achieve and improve this result by constructing two families of folded surface codes with transversal Clifford gates. This construction is presented generally for qudits of any dimension. Lastly, the specific application of these codes to universal quantum computation based on qubit fusion is also discussed.« less
Type Specialization in Aldor

NASA Astrophysics Data System (ADS)

Dragan, Laurentiu; Watt, Stephen M.

Computer algebra in scientific computation squarely faces the dilemma of natural mathematical expression versus efficiency. While higher-order programming constructs and parametric polymorphism provide a natural and expressive language for mathematical abstractions, they can come at a considerable cost. We investigate how deeply nested type constructions may be optimized to achieve performance similar to that of hand-tuned code written in lower-level languages.
An efficient, explicit finite-rate algorithm to compute flows in chemical nonequilibrium

NASA Technical Reports Server (NTRS)

Palmer, Grant

1989-01-01

An explicit finite-rate code was developed to compute hypersonic viscous chemically reacting flows about three-dimensional bodies. Equations describing the finite-rate chemical reactions were fully coupled to the gas dynamic equations using a new coupling technique. The new technique maintains stability in the explicit finite-rate formulation while permitting relatively large global time steps.
Efficient simulation of incompressible viscous flow over multi-element airfoils

NASA Technical Reports Server (NTRS)

Rogers, Stuart E.; Wiltberger, N. Lyn; Kwak, Dochan

1992-01-01

The incompressible, viscous, turbulent flow over single and multi-element airfoils is numerically simulated in an efficient manner by solving the incompressible Navier-Stokes equations. The computer code uses the method of pseudo-compressibility with an upwind-differencing scheme for the convective fluxes and an implicit line-relaxation solution algorithm. The motivation for this work includes interest in studying the high-lift take-off and landing configurations of various aircraft. In particular, accurate computation of lift and drag at various angles of attack, up to stall, is desired. Two different turbulence models are tested in computing the flow over an NACA 4412 airfoil; an accurate prediction of stall is obtained. The approach used for multi-element airfoils involves the use of multiple zones of structured grids fitted to each element. Two different approaches are compared: a patched system of grids, and an overlaid Chimera system of grids. Computational results are presented for two-element, three-element, and four-element airfoil configurations. Excellent agreement with experimental surface pressure coefficients is seen. The code converges in less than 200 iterations, requiring on the order of one minute of CPU time (on a CRAY YMP) per element in the airfoil configuration.
Message Passing vs. Shared Address Space on a Cluster of SMPs

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

2000-01-01

The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Workshop report - A validation study of Navier-Stokes codes for transverse injection into a Mach 2 flow

NASA Technical Reports Server (NTRS)

Eklund, Dean R.; Northam, G. B.; Mcdaniel, J. C.; Smith, Cliff

1992-01-01

A CFD (Computational Fluid Dynamics) competition was held at the Third Scramjet Combustor Modeling Workshop to assess the current state-of-the-art in CFD codes for the analysis of scramjet combustors. Solutions from six three-dimensional Navier-Stokes codes were compared for the case of staged injection of air behind a step into a Mach 2 flow. This case was investigated experimentally at the University of Virginia and extensive in-stream data was obtained. Code-to-code comparisons have been made with regard to both accuracy and efficiency. The turbulence models employed in the solutions are believed to be a major source of discrepancy between the six solutions.
Boltzmann Transport Code Update: Parallelization and Integrated Design Updates

NASA Technical Reports Server (NTRS)

Heinbockel, J. H.; Nealy, J. E.; DeAngelis, G.; Feldman, G. A.; Chokshi, S.

2003-01-01

The on going efforts at developing a web site for radiation analysis is expected to result in an increased usage of the High Charge and Energy Transport Code HZETRN. It would be nice to be able to do the requested calculations quickly and efficiently. Therefore the question arose, "Could the implementation of parallel processing speed up the calculations required?" To answer this question two modifications of the HZETRN computer code were created. The first modification selected the shield material of Al(2219) , then polyethylene and then Al(2219). The modified Fortran code was labeled 1SSTRN.F. The second modification considered the shield material of CO2 and Martian regolith. This modified Fortran code was labeled MARSTRN.F.
User's manual for PRESTO: A computer code for the performance of regenerative steam turbine cycles

NASA Technical Reports Server (NTRS)

Fuller, L. C.; Stovall, T. K.

1979-01-01

Standard turbine cycles for baseload power plants and cycles with such additional features as process steam extraction and induction and feedwater heating by external heat sources may be modeled. Peaking and high back pressure cycles are also included. The code's methodology is to use the expansion line efficiencies, exhaust loss, leakages, mechanical losses, and generator losses to calculate the heat rate and generator output. A general description of the code is given as well as the instructions for input data preparation. Appended are two complete example cases.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Goodrich, John W.; Dyson, Rodger W.

1999-01-01

The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.
LSENS, The NASA Lewis Kinetics and Sensitivity Analysis Code

NASA Technical Reports Server (NTRS)

Radhakrishnan, K.

2000-01-01

A general chemical kinetics and sensitivity analysis code for complex, homogeneous, gas-phase reactions is described. The main features of the code, LSENS (the NASA Lewis kinetics and sensitivity analysis code), are its flexibility, efficiency and convenience in treating many different chemical reaction models. The models include: static system; steady, one-dimensional, inviscid flow; incident-shock initiated reaction in a shock tube; and a perfectly stirred reactor. In addition, equilibrium computations can be performed for several assigned states. An implicit numerical integration method (LSODE, the Livermore Solver for Ordinary Differential Equations), which works efficiently for the extremes of very fast and very slow reactions, is used to solve the "stiff" ordinary differential equation systems that arise in chemical kinetics. For static reactions, the code uses the decoupled direct method to calculate sensitivity coefficients of the dependent variables and their temporal derivatives with respect to the initial values of dependent variables and/or the rate coefficient parameters. Solution methods for the equilibrium and post-shock conditions and for perfectly stirred reactor problems are either adapted from or based on the procedures built into the NASA code CEA (Chemical Equilibrium and Applications).
Lean and Efficient Software: Whole-Program Optimization of Executables

DTIC Science & Technology

2015-06-30

Approved for public release; distribution is unlimited. Financial Data Contact: Krisztina Nagy T: (607) 273-7340 x.117 F : (607) 273-8752 knagy...grammatech.com Administrative Contact: Derek Burrows T: (607) 273-7340 x.113 F : (607) 273-8752 dburrows@grammatech.com Report Documentation Page...library subroutines, removing redundant argument checking and interface layers, eliminating dead code, and improving computational efficiency. In
Interface COMSOL-PHREEQC (iCP), an efficient numerical framework for the solution of coupled multiphysics and geochemistry

NASA Astrophysics Data System (ADS)

Nardi, Albert; Idiart, Andrés; Trinchero, Paolo; de Vries, Luis Manuel; Molinero, Jorge

2014-08-01

This paper presents the development, verification and application of an efficient interface, denoted as iCP, which couples two standalone simulation programs: the general purpose Finite Element framework COMSOL Multiphysics® and the geochemical simulator PHREEQC. The main goal of the interface is to maximize the synergies between the aforementioned codes, providing a numerical platform that can efficiently simulate a wide number of multiphysics problems coupled with geochemistry. iCP is written in Java and uses the IPhreeqc C++ dynamic library and the COMSOL Java-API. Given the large computational requirements of the aforementioned coupled models, special emphasis has been placed on numerical robustness and efficiency. To this end, the geochemical reactions are solved in parallel by balancing the computational load over multiple threads. First, a benchmark exercise is used to test the reliability of iCP regarding flow and reactive transport. Then, a large scale thermo-hydro-chemical (THC) problem is solved to show the code capabilities. The results of the verification exercise are successfully compared with those obtained using PHREEQC and the application case demonstrates the scalability of a large scale model, at least up to 32 threads.
Optimization of computations for adjoint field and Jacobian needed in 3D CSEM inversion

NASA Astrophysics Data System (ADS)

Dehiya, Rahul; Singh, Arun; Gupta, Pravin K.; Israil, M.

2017-01-01

We present the features and results of a newly developed code, based on Gauss-Newton optimization technique, for solving three-dimensional Controlled-Source Electromagnetic inverse problem. In this code a special emphasis has been put on representing the operations by block matrices for conjugate gradient iteration. We show how in the computation of Jacobian, the matrix formed by differentiation of system matrix can be made independent of frequency to optimize the operations at conjugate gradient step. The coarse level parallel computing, using OpenMP framework, is used primarily due to its simplicity in implementation and accessibility of shared memory multi-core computing machine to almost anyone. We demonstrate how the coarseness of modeling grid in comparison to source (comp`utational receivers) spacing can be exploited for efficient computing, without compromising the quality of the inverted model, by reducing the number of adjoint calls. It is also demonstrated that the adjoint field can even be computed on a grid coarser than the modeling grid without affecting the inversion outcome. These observations were reconfirmed using an experiment design where the deviation of source from straight tow line is considered. Finally, a real field data inversion experiment is presented to demonstrate robustness of the code.
Improved Algorithms Speed It Up for Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hazi, A

2005-09-20

Huge computers, huge codes, complex problems to solve. The longer it takes to run a code, the more it costs. One way to speed things up and save time and money is through hardware improvements--faster processors, different system designs, bigger computers. But another side of supercomputing can reap savings in time and speed: software improvements to make codes--particularly the mathematical algorithms that form them--run faster and more efficiently. Speed up math? Is that really possible? According to Livermore physicist Eugene Brooks, the answer is a resounding yes. ''Sure, you get great speed-ups by improving hardware,'' says Brooks, the deputy leadermore » for Computational Physics in N Division, which is part of Livermore's Physics and Advanced Technologies (PAT) Directorate. ''But the real bonus comes on the software side, where improvements in software can lead to orders of magnitude improvement in run times.'' Brooks knows whereof he speaks. Working with Laboratory physicist Abraham Szoeke and others, he has been instrumental in devising ways to shrink the running time of what has, historically, been a tough computational nut to crack: radiation transport codes based on the statistical or Monte Carlo method of calculation. And Brooks is not the only one. Others around the Laboratory, including physicists Andrew Williamson, Randolph Hood, and Jeff Grossman, have come up with innovative ways to speed up Monte Carlo calculations using pure mathematics.« less
A multiphysics and multiscale software environment for modeling astrophysical systems

NASA Astrophysics Data System (ADS)

Portegies Zwart, Simon; McMillan, Steve; Harfst, Stefan; Groen, Derek; Fujii, Michiko; Nualláin, Breanndán Ó.; Glebbeek, Evert; Heggie, Douglas; Lombardi, James; Hut, Piet; Angelou, Vangelis; Banerjee, Sambaran; Belkus, Houria; Fragos, Tassos; Fregeau, John; Gaburov, Evghenii; Izzard, Rob; Jurić, Mario; Justham, Stephen; Sottoriva, Andrea; Teuben, Peter; van Bever, Joris; Yaron, Ofer; Zemp, Marcel

2009-05-01

We present MUSE, a software framework for combining existing computational tools for different astrophysical domains into a single multiphysics, multiscale application. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for studying generalized stellar systems. We have now reached a "Noah's Ark" milestone, with (at least) two available numerical solvers for each domain. MUSE can treat multiscale and multiphysics systems in which the time- and size-scales are well separated, like simulating the evolution of planetary systems, small stellar associations, dense stellar clusters, galaxies and galactic nuclei. In this paper we describe three examples calculated using MUSE: the merger of two galaxies, the merger of two evolving stars, and a hybrid N-body simulation. In addition, we demonstrate an implementation of MUSE on a distributed computer which may also include special-purpose hardware, such as GRAPEs or GPUs, to accelerate computations. The current MUSE code base is publicly available as open source at http://muse.li.
Status and future of MUSE

NASA Astrophysics Data System (ADS)

Harfst, S.; Portegies Zwart, S.; McMillan, S.

2008-12-01

We present MUSE, a software framework for combining existing computational tools from different astrophysical domains into a single multi-physics, multi-scale application. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly-coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for studying generalized stellar systems. We have now reached a ``Noah's Ark'' milestone, with (at least) two available numerical solvers for each domain. MUSE can treat multi-scale and multi-physics systems in which the time- and size-scales are well separated, like simulating the evolution of planetary systems, small stellar associations, dense stellar clusters, galaxies and galactic nuclei. In this paper we describe two examples calculated using MUSE: the merger of two galaxies and an N-body simulation with live stellar evolution. In addition, we demonstrate an implementation of MUSE on a distributed computer which may also include special-purpose hardware, such as GRAPEs or GPUs, to accelerate computations. The current MUSE code base is publicly available as open source at http://muse.li.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

NASA Astrophysics Data System (ADS)

Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

2014-09-01

Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
3-D inelastic analysis methods for hot section components. Volume 2: Advanced special functions models

NASA Technical Reports Server (NTRS)

Wilson, R. B.; Banerjee, P. K.

1987-01-01

This Annual Status Report presents the results of work performed during the third year of the 3-D Inelastic Analysis Methods for Hot Sections Components program (NASA Contract NAS3-23697). The objective of the program is to produce a series of computer codes that permit more accurate and efficient three-dimensional analyses of selected hot section components, i.e., combustor liners, turbine blades, and turbine vanes. The computer codes embody a progression of mathematical models and are streamlined to take advantage of geometrical features, loading conditions, and forms of material response that distinguish each group of selected components.
Scaling up Planetary Dynamo Modeling to Massively Parallel Computing Systems: The Rayleigh Code at ALCF

NASA Astrophysics Data System (ADS)

Featherstone, N. A.; Aurnou, J. M.; Yadav, R. K.; Heimpel, M. H.; Soderlund, K. M.; Matsui, H.; Stanley, S.; Brown, B. P.; Glatzmaier, G.; Olson, P.; Buffett, B. A.; Hwang, L.; Kellogg, L. H.

2017-12-01

In the past three years, CIG's Dynamo Working Group has successfully ported the Rayleigh Code to the Argonne Leadership Computer Facility's Mira BG/Q device. In this poster, we present some our first results, showing simulations of 1) convection in the solar convection zone; 2) dynamo action in Earth's core and 3) convection in the jovian deep atmosphere. These simulations have made efficient use of 131 thousand cores, 131 thousand cores and 232 thousand cores, respectively, on Mira. In addition to our novel results, the joys and logistical challenges of carrying out such large runs will also be discussed.

Evaluation of CFD to Determine Two-Dimensional Airfoil Characteristics for Rotorcraft Applications

NASA Technical Reports Server (NTRS)

Smith, Marilyn J.; Wong, Tin-Chee; Potsdam, Mark; Baeder, James; Phanse, Sujeet

2004-01-01

The efficient prediction of helicopter rotor performance, vibratory loads, and aeroelastic properties still relies heavily on the use of comprehensive analysis codes by the rotorcraft industry. These comprehensive codes utilize look-up tables to provide two-dimensional aerodynamic characteristics. Typically these tables are comprised of a combination of wind tunnel data, empirical data and numerical analyses. The potential to rely more heavily on numerical computations based on Computational Fluid Dynamics (CFD) simulations has become more of a reality with the advent of faster computers and more sophisticated physical models. The ability of five different CFD codes applied independently to predict the lift, drag and pitching moments of rotor airfoils is examined for the SC1095 airfoil, which is utilized in the UH-60A main rotor. Extensive comparisons with the results of ten wind tunnel tests are performed. These CFD computations are found to be as good as experimental data in predicting many of the aerodynamic performance characteristics. Four turbulence models were examined (Baldwin-Lomax, Spalart-Allmaras, Menter SST, and k-omega).
Comparative Evaluation of Different Optimization Algorithms for Structural Design Applications

NASA Technical Reports Server (NTRS)

Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.

1996-01-01

Non-linear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Centre, a project was initiated to assess the performance of eight different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using the eight different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems, however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with Sequential Unconstrained Minimizations Technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Performance Trend of Different Algorithms for Structural Design Optimization

NASA Technical Reports Server (NTRS)

Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.

1996-01-01

Nonlinear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Center, a project was initiated to assess performance of different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with the sequential unconstrained minimizations technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Recursive time-varying filter banks for subband image coding

NASA Technical Reports Server (NTRS)

Smith, Mark J. T.; Chung, Wilson C.

1992-01-01

Filter banks and wavelet decompositions that employ recursive filters have been considered previously and are recognized for their efficiency in partitioning the frequency spectrum. This paper presents an analysis of a new infinite impulse response (IIR) filter bank in which these computationally efficient filters may be changed adaptively in response to the input. The filter bank is presented and discussed in the context of finite-support signals with the intended application in subband image coding. In the absence of quantization errors, exact reconstruction can be achieved and by the proper choice of an adaptation scheme, it is shown that IIR time-varying filter banks can yield improvement over conventional ones.
Final Report from The University of Texas at Austin for DEGAS: Dynamic Global Address Space programming environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erez, Mattan; Yelick, Katherine; Sarkar, Vivek

The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less
Fast computation of quadrupole and hexadecapole approximations in microlensing with a single point-source evaluation

NASA Astrophysics Data System (ADS)

Cassan, Arnaud

2017-07-01

The exoplanet detection rate from gravitational microlensing has grown significantly in recent years thanks to a great enhancement of resources and improved observational strategy. Current observatories include ground-based wide-field and/or robotic world-wide networks of telescopes, as well as space-based observatories such as satellites Spitzer or Kepler/K2. This results in a large quantity of data to be processed and analysed, which is a challenge for modelling codes because of the complexity of the parameter space to be explored and the intensive computations required to evaluate the models. In this work, I present a method that allows to compute the quadrupole and hexadecapole approximations of the finite-source magnification with more efficiency than previously available codes, with routines about six times and four times faster, respectively. The quadrupole takes just about twice the time of a point-source evaluation, which advocates for generalizing its use to large portions of the light curves. The corresponding routines are available as open-source python codes.
Validation of a Computational Fluid Dynamics (CFD) Code for Supersonic Axisymmetric Base Flow

NASA Technical Reports Server (NTRS)

Tucker, P. Kevin

1993-01-01

The ability to accurately and efficiently calculate the flow structure in the base region of bodies of revolution in supersonic flight is a significant step in CFD code validation for applications ranging from base heating for rockets to drag for protectives. The FDNS code is used to compute such a flow and the results are compared to benchmark quality experimental data. Flowfield calculations are presented for a cylindrical afterbody at M = 2.46 and angle of attack a = O. Grid independent solutions are compared to mean velocity profiles in the separated wake area and downstream of the reattachment point. Additionally, quantities such as turbulent kinetic energy and shear layer growth rates are compared to the data. Finally, the computed base pressures are compared to the measured values. An effort is made to elucidate the role of turbulence models in the flowfield predictions. The level of turbulent eddy viscosity, and its origin, are used to contrast the various turbulence models and compare the results to the experimental data.
An efficient nonlinear finite-difference approach in the computational modeling of the dynamics of a nonlinear diffusion-reaction equation in microbial ecology.

PubMed

Macías-Díaz, J E; Macías, Siegfried; Medina-Ramírez, I E

2013-12-01

In this manuscript, we present a computational model to approximate the solutions of a partial differential equation which describes the growth dynamics of microbial films. The numerical technique reported in this work is an explicit, nonlinear finite-difference methodology which is computationally implemented using Newton's method. Our scheme is compared numerically against an implicit, linear finite-difference discretization of the same partial differential equation, whose computer coding requires an implementation of the stabilized bi-conjugate gradient method. Our numerical results evince that the nonlinear approach results in a more efficient approximation to the solutions of the biofilm model considered, and demands less computer memory. Moreover, the positivity of initial profiles is preserved in the practice by the nonlinear scheme proposed. Copyright © 2013 Elsevier Ltd. All rights reserved.
Experimental investigations, modeling, and analyses of high-temperature devices for space applications: Part 1. Final report, June 1996--December 1998

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tournier, J.; El-Genk, M.S.; Huang, L.

1999-01-01

The Institute of Space and Nuclear Power Studies at the University of New Mexico has developed a computer simulation of cylindrical geometry alkali metal thermal-to-electric converter cells using a standard Fortran 77 computer code. The objective and use of this code was to compare the experimental measurements with computer simulations, upgrade the model as appropriate, and conduct investigations of various methods to improve the design and performance of the devices for improved efficiency, durability, and longer operational lifetime. The Institute of Space and Nuclear Power Studies participated in vacuum testing of PX series alkali metal thermal-to-electric converter cells and developedmore » the alkali metal thermal-to-electric converter Performance Evaluation and Analysis Model. This computer model consisted of a sodium pressure loss model, a cell electrochemical and electric model, and a radiation/conduction heat transfer model. The code closely predicted the operation and performance of a wide variety of PX series cells which led to suggestions for improvements to both lifetime and performance. The code provides valuable insight into the operation of the cell, predicts parameters of components within the cell, and is a useful tool for predicting both the transient and steady state performance of systems of cells.« less
Experimental investigations, modeling, and analyses of high-temperature devices for space applications: Part 2. Final report, June 1996--December 1998

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tournier, J.; El-Genk, M.S.; Huang, L.

1999-01-01

The Institute of Space and Nuclear Power Studies at the University of New Mexico has developed a computer simulation of cylindrical geometry alkali metal thermal-to-electric converter cells using a standard Fortran 77 computer code. The objective and use of this code was to compare the experimental measurements with computer simulations, upgrade the model as appropriate, and conduct investigations of various methods to improve the design and performance of the devices for improved efficiency, durability, and longer operational lifetime. The Institute of Space and Nuclear Power Studies participated in vacuum testing of PX series alkali metal thermal-to-electric converter cells and developedmore » the alkali metal thermal-to-electric converter Performance Evaluation and Analysis Model. This computer model consisted of a sodium pressure loss model, a cell electrochemical and electric model, and a radiation/conduction heat transfer model. The code closely predicted the operation and performance of a wide variety of PX series cells which led to suggestions for improvements to both lifetime and performance. The code provides valuable insight into the operation of the cell, predicts parameters of components within the cell, and is a useful tool for predicting both the transient and steady state performance of systems of cells.« less
An Energy-Efficient Compressive Image Coding for Green Internet of Things (IoT).

PubMed

Li, Ran; Duan, Xiaomeng; Li, Xu; He, Wei; Li, Yanling

2018-04-17

Aimed at a low-energy consumption of Green Internet of Things (IoT), this paper presents an energy-efficient compressive image coding scheme, which provides compressive encoder and real-time decoder according to Compressive Sensing (CS) theory. The compressive encoder adaptively measures each image block based on the block-based gradient field, which models the distribution of block sparse degree, and the real-time decoder linearly reconstructs each image block through a projection matrix, which is learned by Minimum Mean Square Error (MMSE) criterion. Both the encoder and decoder have a low computational complexity, so that they only consume a small amount of energy. Experimental results show that the proposed scheme not only has a low encoding and decoding complexity when compared with traditional methods, but it also provides good objective and subjective reconstruction qualities. In particular, it presents better time-distortion performance than JPEG. Therefore, the proposed compressive image coding is a potential energy-efficient scheme for Green IoT.
MATIN: a random network coding based framework for high quality peer-to-peer live video streaming.

PubMed

Barekatain, Behrang; Khezrimotlagh, Dariush; Aizaini Maarof, Mohd; Ghaeini, Hamid Reza; Salleh, Shaharuddin; Quintana, Alfonso Ariza; Akbari, Behzad; Cabrera, Alicia Triviño

2013-01-01

In recent years, Random Network Coding (RNC) has emerged as a promising solution for efficient Peer-to-Peer (P2P) video multicasting over the Internet. This probably refers to this fact that RNC noticeably increases the error resiliency and throughput of the network. However, high transmission overhead arising from sending large coefficients vector as header has been the most important challenge of the RNC. Moreover, due to employing the Gauss-Jordan elimination method, considerable computational complexity can be imposed on peers in decoding the encoded blocks and checking linear dependency among the coefficients vectors. In order to address these challenges, this study introduces MATIN which is a random network coding based framework for efficient P2P video streaming. The MATIN includes a novel coefficients matrix generation method so that there is no linear dependency in the generated coefficients matrix. Using the proposed framework, each peer encapsulates one instead of n coefficients entries into the generated encoded packet which results in very low transmission overhead. It is also possible to obtain the inverted coefficients matrix using a bit number of simple arithmetic operations. In this regard, peers sustain very low computational complexities. As a result, the MATIN permits random network coding to be more efficient in P2P video streaming systems. The results obtained from simulation using OMNET++ show that it substantially outperforms the RNC which uses the Gauss-Jordan elimination method by providing better video quality on peers in terms of the four important performance metrics including video distortion, dependency distortion, End-to-End delay and Initial Startup delay.
Hybrid discrete ordinates and characteristics method for solving the linear Boltzmann equation

NASA Astrophysics Data System (ADS)

Yi, Ce

With the ability of computer hardware and software increasing rapidly, deterministic methods to solve the linear Boltzmann equation (LBE) have attracted some attention for computational applications in both the nuclear engineering and medical physics fields. Among various deterministic methods, the discrete ordinates method (SN) and the method of characteristics (MOC) are two of the most widely used methods. The SN method is the traditional approach to solve the LBE for its stability and efficiency. While the MOC has some advantages in treating complicated geometries. However, in 3-D problems requiring a dense discretization grid in phase space (i.e., a large number of spatial meshes, directions, or energy groups), both methods could suffer from the need for large amounts of memory and computation time. In our study, we developed a new hybrid algorithm by combing the two methods into one code, TITAN. The hybrid approach is specifically designed for application to problems containing low scattering regions. A new serial 3-D time-independent transport code has been developed. Under the hybrid approach, the preferred method can be applied in different regions (blocks) within the same problem model. Since the characteristics method is numerically more efficient in low scattering media, the hybrid approach uses a block-oriented characteristics solver in low scattering regions, and a block-oriented SN solver in the remainder of the physical model. In the TITAN code, a physical problem model is divided into a number of coarse meshes (blocks) in Cartesian geometry. Either the characteristics solver or the SN solver can be chosen to solve the LBE within a coarse mesh. A coarse mesh can be filled with fine meshes or characteristic rays depending on the solver assigned to the coarse mesh. Furthermore, with its object-oriented programming paradigm and layered code structure, TITAN allows different individual spatial meshing schemes and angular quadrature sets for each coarse mesh. Two quadrature types (level-symmetric and Legendre-Chebyshev quadrature) along with the ordinate splitting techniques (rectangular splitting and PN-TN splitting) are implemented. In the S N solver, we apply a memory-efficient 'front-line' style paradigm to handle the fine mesh interface fluxes. In the characteristics solver, we have developed a novel 'backward' ray-tracing approach, in which a bi-linear interpolation procedure is used on the incoming boundaries of a coarse mesh. A CPU-efficient scattering kernel is shared in both solvers within the source iteration scheme. Angular and spatial projection techniques are developed to transfer the angular fluxes on the interfaces of coarse meshes with different discretization grids. The performance of the hybrid algorithm is tested in a number of benchmark problems in both nuclear engineering and medical physics fields. Among them are the Kobayashi benchmark problems and a computational tomography (CT) device model. We also developed an extra sweep procedure with the fictitious quadrature technique to calculate angular fluxes along directions of interest. The technique is applied in a single photon emission computed tomography (SPECT) phantom model to simulate the SPECT projection images. The accuracy and efficiency of the TITAN code are demonstrated in these benchmarks along with its scalability. A modified version of the characteristics solver is integrated in the PENTRAN code and tested within the parallel engine of PENTRAN. The limitations on the hybrid algorithm are also studied.
Scalability improvements to NRLMOL for DFT calculations of large molecules

NASA Astrophysics Data System (ADS)

Diaz, Carlos Manuel

Advances in high performance computing (HPC) have provided a way to treat large, computationally demanding tasks using thousands of processors. With the development of more powerful HPC architectures, the need to create efficient and scalable code has grown more important. Electronic structure calculations are valuable in understanding experimental observations and are routinely used for new materials predictions. For the electronic structure calculations, the memory and computation time are proportional to the number of atoms. Memory requirements for these calculations scale as N2, where N is the number of atoms. While the recent advances in HPC offer platforms with large numbers of cores, the limited amount of memory available on a given node and poor scalability of the electronic structure code hinder their efficient usage of these platforms. This thesis will present some developments to overcome these bottlenecks in order to study large systems. These developments, which are implemented in the NRLMOL electronic structure code, involve the use of sparse matrix storage formats and the use of linear algebra using sparse and distributed matrices. These developments along with other related development now allow ground state density functional calculations using up to 25,000 basis functions and the excited state calculations using up to 17,000 basis functions while utilizing all cores on a node. An example on a light-harvesting triad molecule is described. Finally, future plans to further improve the scalability will be presented.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Wang, Peng; Plimpton, Steven J

The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

PubMed

Cao, Yinhe; Tung, Wen-Wen; Gao, J B

2004-01-01

With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
WARP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bergmann, Ryan M.; Rowland, Kelly L.

2017-04-12

WARP, which can stand for ``Weaving All the Random Particles,'' is a three-dimensional (3D) continuous energy Monte Carlo neutron transport code developed at UC Berkeley to efficiently execute on NVIDIA graphics processing unit (GPU) platforms. WARP accelerates Monte Carlo simulations while preserving the benefits of using the Monte Carlo method, namely, that very few physical and geometrical simplifications are applied. WARP is able to calculate multiplication factors, neutron flux distributions (in both space and energy), and fission source distributions for time-independent neutron transport problems. It can run in both criticality or fixed source modes, but fixed source mode is currentlymore » not robust, optimized, or maintained in the newest version. WARP can transport neutrons in unrestricted arrangements of parallelepipeds, hexagonal prisms, cylinders, and spheres. The goal of developing WARP is to investigate algorithms that can grow into a full-featured, continuous energy, Monte Carlo neutron transport code that is accelerated by running on GPUs. The crux of the effort is to make Monte Carlo calculations faster while producing accurate results. Modern supercomputers are commonly being built with GPU coprocessor cards in their nodes to increase their computational efficiency and performance. GPUs execute efficiently on data-parallel problems, but most CPU codes, including those for Monte Carlo neutral particle transport, are predominantly task-parallel. WARP uses a data-parallel neutron transport algorithm to take advantage of the computing power GPUs offer.« less
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
A 3DHZETRN Code in a Spherical Uniform Sphere with Monte Carlo Verification

NASA Technical Reports Server (NTRS)

Wilson, John W.; Slaba, Tony C.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.

2014-01-01

The computationally efficient HZETRN code has been used in recent trade studies for lunar and Martian exploration and is currently being used in the engineering development of the next generation of space vehicles, habitats, and extra vehicular activity equipment. A new version (3DHZETRN) capable of transporting High charge (Z) and Energy (HZE) and light ions (including neutrons) under space-like boundary conditions with enhanced neutron and light ion propagation is under development. In the present report, new algorithms for light ion and neutron propagation with well-defined convergence criteria in 3D objects is developed and tested against Monte Carlo simulations to verify the solution methodology. The code will be available through the software system, OLTARIS, for shield design and validation and provides a basis for personal computer software capable of space shield analysis and optimization.

Adjusting process count on demand for petascale global optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sosonkina, Masha; Watson, Layne T.; Radcliffe, Nicholas R.

2012-11-23

There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This paper describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, themore » modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed.« less
MUTILS - a set of efficient modeling tools for multi-core CPUs implemented in MEX

NASA Astrophysics Data System (ADS)

Krotkiewski, Marcin; Dabrowski, Marcin

2013-04-01

The need for computational performance is common in scientific applications, and in particular in numerical simulations, where high resolution models require efficient processing of large amounts of data. Especially in the context of geological problems the need to increase the model resolution to resolve physical and geometrical complexities seems to have no limits. Alas, the performance of new generations of CPUs does not improve any longer by simply increasing clock speeds. Current industrial trends are to increase the number of computational cores. As a result, parallel implementations are required in order to fully utilize the potential of new processors, and to study more complex models. We target simulations on small to medium scale shared memory computers: laptops and desktop PCs with ~8 CPU cores and up to tens of GB of memory to high-end servers with ~50 CPU cores and hundereds of GB of memory. In this setting MATLAB is often the environment of choice for scientists that want to implement their own models with little effort. It is a useful general purpose mathematical software package, but due to its versatility some of its functionality is not as efficient as it could be. In particular, the challanges of modern multi-core architectures are not fully addressed. We have developed MILAMIN 2 - an efficient FEM modeling environment written in native MATLAB. Amongst others, MILAMIN provides functions to define model geometry, generate and convert structured and unstructured meshes (also through interfaces to external mesh generators), compute element and system matrices, apply boundary conditions, solve the system of linear equations, address non-linear and transient problems, and perform post-processing. MILAMIN strives to combine the ease of code development and the computational efficiency. Where possible, the code is optimized and/or parallelized within the MATLAB framework. Native MATLAB is augmented with the MUTILS library - a set of MEX functions that implement the computationally intensive, performance critical parts of the code, which we have identified to be bottlenecks. Here, we discuss the functionality and performance of the MUTILS library. Currently, it includes: 1. time and memory efficient assembly of sparse matrices for FEM simulations 2. parallel sparse matrix - vector product with optimizations speficic to symmetric matrices and multiple degrees of freedom per node 3. parallel point in triangle location and point in tetrahedron location for unstructured, adaptive 2D and 3D meshes (useful for 'marker in cell' type of methods) 4. parallel FEM interpolation for 2D and 3D meshes of elements of different types and orders, and for different number of degrees of freedom per node 5. a stand-alone, MEX implementation of the Conjugate Gradients iterative solver 6. interface to METIS graph partitioning and a fast implementation of RCM reordering
Remote control system for high-perfomance computer simulation of crystal growth by the PFC method

NASA Astrophysics Data System (ADS)

Pavlyuk, Evgeny; Starodumov, Ilya; Osipov, Sergei

2017-04-01

Modeling of crystallization process by the phase field crystal method (PFC) - one of the important directions of modern computational materials science. In this paper, the practical side of the computer simulation of the crystallization process by the PFC method is investigated. To solve problems using this method, it is necessary to use high-performance computing clusters, data storage systems and other often expensive complex computer systems. Access to such resources is often limited, unstable and accompanied by various administrative problems. In addition, the variety of software and settings of different computing clusters sometimes does not allow researchers to use unified program code. There is a need to adapt the program code for each configuration of the computer complex. The practical experience of the authors has shown that the creation of a special control system for computing with the possibility of remote use can greatly simplify the implementation of simulations and increase the performance of scientific research. In current paper we show the principal idea of such a system and justify its efficiency.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Summers, R.M.; Wong, M.K.; Boucheron, E.A.

1997-12-31

ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Zero-block mode decision algorithm for H.264/AVC.

PubMed

Lee, Yu-Ming; Lin, Yinyi

2009-03-01

In the previous paper , we proposed a zero-block intermode decision algorithm for H.264 video coding based upon the number of zero-blocks of 4 x 4 DCT coefficients between the current macroblock and the co-located macroblock. The proposed algorithm can achieve significant improvement in computation, but the computation performance is limited for high bit-rate coding. To improve computation efficiency, in this paper, we suggest an enhanced zero-block decision algorithm, which uses an early zero-block detection method to compute the number of zero-blocks instead of direct DCT and quantization (DCT/Q) calculation and incorporates two adequate decision methods into semi-stationary and nonstationary regions of a video sequence. In addition, the zero-block decision algorithm is also applied to the intramode prediction in the P frame. The enhanced zero-block decision algorithm brings out a reduction of average 27% of total encoding time compared to the zero-block decision algorithm.
Progress of High Efficiency Centrifugal Compressor Simulations Using TURBO

NASA Technical Reports Server (NTRS)

Kulkarni, Sameer; Beach, Timothy A.

2017-01-01

Three-dimensional, time-accurate, and phase-lagged computational fluid dynamics (CFD) simulations of the High Efficiency Centrifugal Compressor (HECC) stage were generated using the TURBO solver. Changes to the TURBO Parallel Version 4 source code were made in order to properly model the no-slip boundary condition along the spinning hub region for centrifugal impellers. A startup procedure was developed to generate a converged flow field in TURBO. This procedure initialized computations on a coarsened mesh generated by the Turbomachinery Gridding System (TGS) and relied on a method of systematically increasing wheel speed and backpressure. Baseline design-speed TURBO results generally overpredicted total pressure ratio, adiabatic efficiency, and the choking flow rate of the HECC stage as compared with the design-intent CFD results of Code Leo. Including diffuser fillet geometry in the TURBO computation resulted in a 0.6 percent reduction in the choking flow rate and led to a better match with design-intent CFD. Diffuser fillets reduced annulus cross-sectional area but also reduced corner separation, and thus blockage, in the diffuser passage. It was found that the TURBO computations are somewhat insensitive to inlet total pressure changing from the TURBO default inlet pressure of 14.7 pounds per square inch (101.35 kilopascals) down to 11.0 pounds per square inch (75.83 kilopascals), the inlet pressure of the component test. Off-design tip clearance was modeled in TURBO in two computations: one in which the blade tip geometry was trimmed by 12 mils (0.3048 millimeters), and another in which the hub flow path was moved to reflect a 12-mil axial shift in the impeller hub, creating a step at the hub. The one-dimensional results of these two computations indicate non-negligible differences between the two modeling approaches.
On the linear programming bound for linear Lee codes.

PubMed

Astola, Helena; Tabus, Ioan

2016-01-01

Based on an invariance-type property of the Lee-compositions of a linear Lee code, additional equality constraints can be introduced to the linear programming problem of linear Lee codes. In this paper, we formulate this property in terms of an action of the multiplicative group of the field [Formula: see text] on the set of Lee-compositions. We show some useful properties of certain sums of Lee-numbers, which are the eigenvalues of the Lee association scheme, appearing in the linear programming problem of linear Lee codes. Using the additional equality constraints, we formulate the linear programming problem of linear Lee codes in a very compact form, leading to a fast execution, which allows to efficiently compute the bounds for large parameter values of the linear codes.
Development of a Web Based Simulating System for Earthquake Modeling on the Grid

NASA Astrophysics Data System (ADS)

Seber, D.; Youn, C.; Kaiser, T.

2007-12-01

Existing cyberinfrastructure-based information, data and computational networks now allow development of state- of-the-art, user-friendly simulation environments that democratize access to high-end computational environments and provide new research opportunities for many research and educational communities. Within the Geosciences cyberinfrastructure network, GEON, we have developed the SYNSEIS (SYNthetic SEISmogram) toolkit to enable efficient computations of 2D and 3D seismic waveforms for a variety of research purposes especially for helping to analyze the EarthScope's USArray seismic data in a speedy and efficient environment. The underlying simulation software in SYNSEIS is a finite difference code, E3D, developed by LLNL (S. Larsen). The code is embedded within the SYNSEIS portlet environment and it is used by our toolkit to simulate seismic waveforms of earthquakes at regional distances (<1000km). Architecturally, SYNSEIS uses both Web Service and Grid computing resources in a portal-based work environment and has a built in access mechanism to connect to national supercomputer centers as well as to a dedicated, small-scale compute cluster for its runs. Even though Grid computing is well-established in many computing communities, its use among domain scientists still is not trivial because of multiple levels of complexities encountered. We grid-enabled E3D using our own dialect XML inputs that include geological models that are accessible through standard Web services within the GEON network. The XML inputs for this application contain structural geometries, source parameters, seismic velocity, density, attenuation values, number of time steps to compute, and number of stations. By enabling a portal based access to a such computational environment coupled with its dynamic user interface we enable a large user community to take advantage of such high end calculations in their research and educational activities. Our system can be used to promote an efficient and effective modeling environment to help scientists as well as educators in their daily activities and speed up the scientific discovery process.
PEGASUS 5: An Automated Pre-Processor for Overset-Grid CFD

NASA Technical Reports Server (NTRS)

Suhs, Norman E.; Rogers, Stuart E.; Dietz, William E.; Kwak, Dochan (Technical Monitor)

2002-01-01

An all new, automated version of the PEGASUS software has been developed and tested. PEGASUS provides the hole-cutting and connectivity information between overlapping grids, and is used as the final part of the grid generation process for overset-grid computational fluid dynamics approaches. The new PEGASUS code (Version 5) has many new features: automated hole cutting; a projection scheme for fixing gaps in overset surfaces; more efficient interpolation search methods using an alternating digital tree; hole-size optimization based on adding additional layers of fringe points; and an automatic restart capability. The new code has also been parallelized using the Message Passing Interface standard. The parallelization performance provides efficient speed-up of the execution time by an order of magnitude, and up to a factor of 30 for very large problems. The results of three example cases are presented: a three-element high-lift airfoil, a generic business jet configuration, and a complete Boeing 777-200 aircraft in a high-lift landing configuration. Comparisons of the computed flow fields for the airfoil and 777 test cases between the old and new versions of the PEGASUS codes show excellent agreement with each other and with experimental results.
Efficient full wave code for the coupling of large multirow multijunction LH grills

NASA Astrophysics Data System (ADS)

Preinhaelter, Josef; Hillairet, Julien; Milanesio, Daniele; Maggiora, Riccardo; Urban, Jakub; Vahala, Linda; Vahala, George

2017-11-01

The full wave code OLGA, for determining the coupling of a single row lower hybrid launcher (waveguide grills) to the plasma, is extended to handle multirow multijunction active passive structures (like the C3 and C4 launchers on TORE SUPRA) by implementing the scattering matrix formalism. The extended code is still computationally fast because of the use of (i) 2D splines of the plasma surface admittance in the accessibility region of the k-space, (ii) high order Gaussian quadrature rules for the integration of the coupling elements and (iii) utilizing the symmetries of the coupling elements in the multiperiodic structures. The extended OLGA code is benchmarked against the ALOHA-1D, ALOHA-2D and TOPLHA codes for the coupling of the C3 and C4 TORE SUPRA launchers for several plasma configurations derived from reflectometry and interferometery. Unlike nearly all codes (except the ALOHA-1D code), OLGA does not require large computational resources and can be used for everyday usage in planning experimental runs. In particular, it is shown that the OLGA code correctly handles the coupling of the C3 and C4 launchers over a very wide range of plasma densities in front of the grill.
HZETRN: A heavy ion/nucleon transport code for space radiations

NASA Technical Reports Server (NTRS)

Wilson, John W.; Chun, Sang Y.; Badavi, Forooz F.; Townsend, Lawrence W.; Lamkin, Stanley L.

1991-01-01

The galactic heavy ion transport code (GCRTRN) and the nucleon transport code (BRYNTRN) are integrated into a code package (HZETRN). The code package is computer efficient and capable of operating in an engineering design environment for manned deep space mission studies. The nuclear data set used by the code is discussed including current limitations. Although the heavy ion nuclear cross sections are assumed constant, the nucleon-nuclear cross sections of BRYNTRN with full energy dependence are used. The relation of the final code to the Boltzmann equation is discussed in the context of simplifying assumptions. Error generation and propagation is discussed, and comparison is made with simplified analytic solutions to test numerical accuracy of the final results. A brief discussion of biological issues and their impact on fundamental developments in shielding technology is given.
Comparison of DAC and MONACO DSMC Codes with Flat Plate Simulation

NASA Technical Reports Server (NTRS)

Padilla, Jose F.

2010-01-01

Various implementations of the direct simulation Monte Carlo (DSMC) method exist in academia, government and industry. By comparing implementations, deficiencies and merits of each can be discovered. This document reports comparisons between DSMC Analysis Code (DAC) and MONACO. DAC is NASA's standard DSMC production code and MONACO is a research DSMC code developed in academia. These codes have various differences; in particular, they employ distinct computational grid definitions. In this study, DAC and MONACO are compared by having each simulate a blunted flat plate wind tunnel test, using an identical volume mesh. Simulation expense and DSMC metrics are compared. In addition, flow results are compared with available laboratory data. Overall, this study revealed that both codes, excluding grid adaptation, performed similarly. For parallel processing, DAC was generally more efficient. As expected, code accuracy was mainly dependent on physical models employed.
A Fast Optimization Method for General Binary Code Learning.

PubMed

Shen, Fumin; Zhou, Xiang; Yang, Yang; Song, Jingkuan; Shen, Heng; Tao, Dacheng

2016-09-22

Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely-used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this work, we propose a novel binary code optimization method, dubbed Discrete Proximal Linearized Minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this work by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised `2 loss encodes the whole NUS-WIDE database into 64-bit binary codes within 10 seconds on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale datasets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.
IMAT (Integrated Multidisciplinary Analysis Tool) user's guide for the VAX/VMS computer

NASA Technical Reports Server (NTRS)

Meissner, Frances T. (Editor)

1988-01-01

The Integrated Multidisciplinary Analysis Tool (IMAT) is a computer software system for the VAX/VMS computer developed at the Langley Research Center. IMAT provides researchers and analysts with an efficient capability to analyze satellite control systems influenced by structural dynamics. Using a menu-driven executive system, IMAT leads the user through the program options. IMAT links a relational database manager to commercial and in-house structural and controls analysis codes. This paper describes the IMAT software system and how to use it.
Text Exchange System

NASA Technical Reports Server (NTRS)

Snyder, W. V.; Hanson, R. J.

1986-01-01

Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.
Study of the modifications needed for efficient operation of NASTRAN on the Control Data Corporation STAR-100 computer

NASA Technical Reports Server (NTRS)

1975-01-01

NASA structural analysis (NASTRAN) computer program is operational on three series of third generation computers. The problem and difficulties involved in adapting NASTRAN to a fourth generation computer, namely, the Control Data STAR-100, are discussed. The salient features which distinguish Control Data STAR-100 from third generation computers are hardware vector processing capability and virtual memory. A feasible method is presented for transferring NASTRAN to Control Data STAR-100 system while retaining much of the machine-independent code. Basic matrix operations are noted for optimization for vector processing.
Groundwater flow and heat transport for systems undergoing freeze-thaw: Intercomparison of numerical simulators for 2D test cases

DOE PAGES

Grenier, Christophe; Anbergen, Hauke; Bense, Victor; ...

2018-02-26

In high-elevation, boreal and arctic regions, hydrological processes and associated water bodies can be strongly influenced by the distribution of permafrost. Recent field and modeling studies indicate that a fully-coupled multidimensional thermo-hydraulic approach is required to accurately model the evolution of these permafrost-impacted landscapes and groundwater systems. However, the relatively new and complex numerical codes being developed for coupled non-linear freeze-thaw systems require verification. Here in this paper, this issue is addressed by means of an intercomparison of thirteen numerical codes for two-dimensional test cases with several performance metrics (PMs). These codes comprise a wide range of numerical approaches, spatialmore » and temporal discretization strategies, and computational efficiencies. Results suggest that the codes provide robust results for the test cases considered and that minor discrepancies are explained by computational precision. However, larger discrepancies are observed for some PMs resulting from differences in the governing equations, discretization issues, or in the freezing curve used by some codes.« less
Development of full wave code for modeling RF fields in hot non-uniform plasmas

NASA Astrophysics Data System (ADS)

Zhao, Liangji; Svidzinski, Vladimir; Spencer, Andrew; Kim, Jin-Soo

2016-10-01

FAR-TECH, Inc. is developing a full wave RF modeling code to model RF fields in fusion devices and in general plasma applications. As an important component of the code, an adaptive meshless technique is introduced to solve the wave equations, which allows resolving plasma resonances efficiently and adapting to the complexity of antenna geometry and device boundary. The computational points are generated using either a point elimination method or a force balancing method based on the monitor function, which is calculated by solving the cold plasma dispersion equation locally. Another part of the code is the conductivity kernel calculation, used for modeling the nonlocal hot plasma dielectric response. The conductivity kernel is calculated on a coarse grid of test points and then interpolated linearly onto the computational points. All the components of the code are parallelized using MPI and OpenMP libraries to optimize the execution speed and memory. The algorithm and the results of our numerical approach to solving 2-D wave equations in a tokamak geometry will be presented. Work is supported by the U.S. DOE SBIR program.
Groundwater flow and heat transport for systems undergoing freeze-thaw: Intercomparison of numerical simulators for 2D test cases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grenier, Christophe; Anbergen, Hauke; Bense, Victor

In high-elevation, boreal and arctic regions, hydrological processes and associated water bodies can be strongly influenced by the distribution of permafrost. Recent field and modeling studies indicate that a fully-coupled multidimensional thermo-hydraulic approach is required to accurately model the evolution of these permafrost-impacted landscapes and groundwater systems. However, the relatively new and complex numerical codes being developed for coupled non-linear freeze-thaw systems require verification. Here in this paper, this issue is addressed by means of an intercomparison of thirteen numerical codes for two-dimensional test cases with several performance metrics (PMs). These codes comprise a wide range of numerical approaches, spatialmore » and temporal discretization strategies, and computational efficiencies. Results suggest that the codes provide robust results for the test cases considered and that minor discrepancies are explained by computational precision. However, larger discrepancies are observed for some PMs resulting from differences in the governing equations, discretization issues, or in the freezing curve used by some codes.« less
Web Service Model for Plasma Simulations with Automatic Post Processing and Generation of Visual Diagnostics*

NASA Astrophysics Data System (ADS)

Exby, J.; Busby, R.; Dimitrov, D. A.; Bruhwiler, D.; Cary, J. R.

2003-10-01

We present our design and initial implementation of a web service model for running particle-in-cell (PIC) codes remotely from a web browser interface. PIC codes have grown significantly in complexity and now often require parallel execution on multiprocessor computers, which in turn requires sophisticated post-processing and data analysis. A significant amount of time and effort is required for a physicist to develop all the necessary skills, at the expense of actually doing research. Moreover, parameter studies with a computationally intensive code justify the systematic management of results with an efficient way to communicate them among a group of remotely located collaborators. Our initial implementation uses the OOPIC Pro code [1], Linux, Apache, MySQL, Python, and PHP. The Interactive Data Language is used for visualization. [1] D.L. Bruhwiler et al., Phys. Rev. ST-AB 4, 101302 (2001). * This work is supported by DOE grant # DE-FG02-03ER83857 and by Tech-X Corp. ** Also University of Colorado.

Parallel Higher-order Finite Element Method for Accurate Field Computations in Wakefield and PIC Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell)more » approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.« less
Efficient burst image compression using H.265/HEVC

NASA Astrophysics Data System (ADS)

Roodaki-Lavasani, Hoda; Lainema, Jani

2014-02-01

New imaging use cases are emerging as more powerful camera hardware is entering consumer markets. One family of such use cases is based on capturing multiple pictures instead of just one when taking a photograph. That kind of a camera operation allows e.g. selecting the most successful shot from a sequence of images, showing what happened right before or after the shot was taken or combining the shots by computational means to improve either visible characteristics of the picture (such as dynamic range or focus) or the artistic aspects of the photo (e.g. by superimposing pictures on top of each other). Considering that photographic images are typically of high resolution and quality and the fact that these kind of image bursts can consist of at least tens of individual pictures, an efficient compression algorithm is desired. However, traditional video coding approaches fail to provide the random access properties these use cases require to achieve near-instantaneous access to the pictures in the coded sequence. That feature is critical to allow users to browse the pictures in an arbitrary order or imaging algorithms to extract desired pictures from the sequence quickly. This paper proposes coding structures that provide such random access properties while achieving coding efficiency superior to existing image coders. The results indicate that using HEVC video codec with a single reference picture fixed for the whole sequence can achieve nearly as good compression as traditional IPPP coding structures. It is also shown that the selection of the reference frame can further improve the coding efficiency.
AutoBayes Program Synthesis System Users Manual

NASA Technical Reports Server (NTRS)

Schumann, Johann; Jafari, Hamed; Pressburger, Tom; Denney, Ewen; Buntine, Wray; Fischer, Bernd

2008-01-01

Program synthesis is the systematic, automatic construction of efficient executable code from high-level declarative specifications. AutoBayes is a fully automatic program synthesis system for the statistical data analysis domain; in particular, it solves parameter estimation problems. It has seen many successful applications at NASA and is currently being used, for example, to analyze simulation results for Orion. The input to AutoBayes is a concise description of a data analysis problem composed of a parameterized statistical model and a goal that is a probability term involving parameters and input data. The output is optimized and fully documented C/C++ code computing the values for those parameters that maximize the probability term. AutoBayes can solve many subproblems symbolically rather than having to rely on numeric approximation algorithms, thus yielding effective, efficient, and compact code. Statistical analysis is faster and more reliable, because effort can be focused on model development and validation rather than manual development of solution algorithms and code.
Efficient method for the calculation of mean extinction. II. Analyticity of the complex extinction efficiency of homogeneous spheroids and finite cylinders.

PubMed

Xing, Z F; Greenberg, J M

1994-08-20

The analyticity of the complex extinction efficiency is examined numerically in the size-parameter domain for homogeneous prolate and oblate spheroids and finite cylinders. The T-matrix code, which is the most efficient program available to date, is employed to calculate the individual particle-extinction efficiencies. Because of its computational limitations in the size-parameter range, a slightly modified Hilbert-transform algorithm is required to establish the analyticity numerically. The findings concerning analyticity that we reported for spheres (Astrophys. J. 399, 164-175, 1992) apply equally to these nonspherical particles.
A comprehensive study of MPI parallelism in three-dimensional discrete element method (DEM) simulation of complex-shaped granular particles

NASA Astrophysics Data System (ADS)

Yan, Beichuan; Regueiro, Richard A.

2018-02-01

A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Clinical evaluation of BrainTree, a motor imagery hybrid BCI speller

NASA Astrophysics Data System (ADS)

Perdikis, S.; Leeb, R.; Williamson, J.; Ramsay, A.; Tavella, M.; Desideri, L.; Hoogerwerf, E.-J.; Al-Khodairy, A.; Murray-Smith, R.; Millán, J. d. R.

2014-06-01

Objective. While brain-computer interfaces (BCIs) for communication have reached considerable technical maturity, there is still a great need for state-of-the-art evaluation by the end-users outside laboratory environments. To achieve this primary objective, it is necessary to augment a BCI with a series of components that allow end-users to type text effectively. Approach. This work presents the clinical evaluation of a motor imagery (MI) BCI text-speller, called BrainTree, by six severely disabled end-users and ten able-bodied users. Additionally, we define a generic model of code-based BCI applications, which serves as an analytical tool for evaluation and design. Main results. We show that all users achieved remarkable usability and efficiency outcomes in spelling. Furthermore, our model-based analysis highlights the added value of human-computer interaction techniques and hybrid BCI error-handling mechanisms, and reveals the effects of BCI performances on usability and efficiency in code-based applications. Significance. This study demonstrates the usability potential of code-based MI spellers, with BrainTree being the first to be evaluated by a substantial number of end-users, establishing them as a viable, competitive alternative to other popular BCI spellers. Another major outcome of our model-based analysis is the derivation of a 80% minimum command accuracy requirement for successful code-based application control, revising upwards previous estimates attempted in the literature.
Clinical evaluation of BrainTree, a motor imagery hybrid BCI speller.

PubMed

Perdikis, S; Leeb, R; Williamson, J; Ramsay, A; Tavella, M; Desideri, L; Hoogerwerf, E-J; Al-Khodairy, A; Murray-Smith, R; Millán, J D R

2014-06-01

While brain-computer interfaces (BCIs) for communication have reached considerable technical maturity, there is still a great need for state-of-the-art evaluation by the end-users outside laboratory environments. To achieve this primary objective, it is necessary to augment a BCI with a series of components that allow end-users to type text effectively. This work presents the clinical evaluation of a motor imagery (MI) BCI text-speller, called BrainTree, by six severely disabled end-users and ten able-bodied users. Additionally, we define a generic model of code-based BCI applications, which serves as an analytical tool for evaluation and design. We show that all users achieved remarkable usability and efficiency outcomes in spelling. Furthermore, our model-based analysis highlights the added value of human-computer interaction techniques and hybrid BCI error-handling mechanisms, and reveals the effects of BCI performances on usability and efficiency in code-based applications. This study demonstrates the usability potential of code-based MI spellers, with BrainTree being the first to be evaluated by a substantial number of end-users, establishing them as a viable, competitive alternative to other popular BCI spellers. Another major outcome of our model-based analysis is the derivation of a 80% minimum command accuracy requirement for successful code-based application control, revising upwards previous estimates attempted in the literature.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

NASA Astrophysics Data System (ADS)

Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.

2017-01-01

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).
Benchmarking of Computational Models for NDE and SHM of Composites

NASA Technical Reports Server (NTRS)

Wheeler, Kevin; Leckey, Cara; Hafiychuk, Vasyl; Juarez, Peter; Timucin, Dogan; Schuet, Stefan; Hafiychuk, Halyna

2016-01-01

Ultrasonic wave phenomena constitute the leading physical mechanism for nondestructive evaluation (NDE) and structural health monitoring (SHM) of solid composite materials such as carbon-fiber-reinforced polymer (CFRP) laminates. Computational models of ultrasonic guided-wave excitation, propagation, scattering, and detection in quasi-isotropic laminates can be extremely valuable in designing practically realizable NDE and SHM hardware and software with desired accuracy, reliability, efficiency, and coverage. This paper presents comparisons of guided-wave simulations for CFRP composites implemented using three different simulation codes: two commercial finite-element analysis packages, COMSOL and ABAQUS, and a custom code implementing the Elastodynamic Finite Integration Technique (EFIT). Comparisons are also made to experimental laser Doppler vibrometry data and theoretical dispersion curves.
Design of hat-stiffened composite panels loaded in axial compression

NASA Astrophysics Data System (ADS)

Paul, T. K.; Sinha, P. K.

An integrated step-by-step analysis procedure for the design of axially compressed stiffened composite panels is outlined. The analysis makes use of the effective width concept. A computer code, BUSTCOP, is developed incorporating various aspects of buckling such as skin buckling, stiffener crippling and column buckling. Other salient features of the computer code include capabilities for generation of data based on micromechanics theories and hygrothermal analysis, and for prediction of strength failure. Parametric studies carried out on a hat-stiffened structural element indicate that, for all practical purposes, composite panels exhibit higher structural efficiency. Some hybrid laminates with outer layers made of aluminum alloy also show great promise for flight vehicle structural applications.
A domain specific language for performance portable molecular dynamics algorithms

NASA Astrophysics Data System (ADS)

Saunders, William Robert; Grant, James; Müller, Eike Hermann

2018-03-01

Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we describe a "Separation of Concerns" approach for the development of parallel and optimised MD codes: the science specialist writes code at a high abstraction level in a domain specific language (DSL), which is then translated into efficient computer code by a scientific programmer. In a related context, an abstraction for the solution of partial differential equations with grid based methods has recently been implemented in the (Py)OP2 library. Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and GPUs. We demonstrate the efficiency of the auto-generated code by studying its performance and scalability on different hardware and compare it to other state-of-the-art simulation packages. With growing data volumes the extraction of physically meaningful information from the simulation becomes increasingly challenging and requires equally efficient implementations. A particular advantage of our approach is the easy expression of such analysis algorithms. We consider two popular methods for deducing the crystalline structure of a material from the local environment of each atom, show how they can be expressed in our abstraction and implement them in the code generation framework.
Development of a CFD Code for Analysis of Fluid Dynamic Forces in Seals

NASA Technical Reports Server (NTRS)

Athavale, Mahesh M.; Przekwas, Andrzej J.; Singhal, Ashok K.

1991-01-01

The aim is to develop a 3-D computational fluid dynamics (CFD) code for the analysis of fluid flow in cylindrical seals and evaluation of the dynamic forces on the seals. This code is expected to serve as a scientific tool for detailed flow analysis as well as a check for the accuracy of the 2D industrial codes. The features necessary in the CFD code are outlined. The initial focus was to develop or modify and implement new techniques and physical models. These include collocated grid formulation, rotating coordinate frames and moving grid formulation. Other advanced numerical techniques include higher order spatial and temporal differencing and an efficient linear equation solver. These techniques were implemented in a 2D flow solver for initial testing. Several benchmark test cases were computed using the 2D code, and the results of these were compared to analytical solutions or experimental data to check the accuracy. Tests presented here include planar wedge flow, flow due to an enclosed rotor, and flow in a 2D seal with a whirling rotor. Comparisons between numerical and experimental results for an annular seal and a 7-cavity labyrinth seal are also included.
State-Chart Autocoder

NASA Technical Reports Server (NTRS)

Clark, Kenneth; Watney, Garth; Murray, Alexander; Benowitz, Edward

2007-01-01

A computer program translates Unified Modeling Language (UML) representations of state charts into source code in the C, C++, and Python computing languages. ( State charts signifies graphical descriptions of states and state transitions of a spacecraft or other complex system.) The UML representations constituting the input to this program are generated by using a UML-compliant graphical design program to draw the state charts. The generated source code is consistent with the "quantum programming" approach, which is so named because it involves discrete states and state transitions that have features in common with states and state transitions in quantum mechanics. Quantum programming enables efficient implementation of state charts, suitable for real-time embedded flight software. In addition to source code, the autocoder program generates a graphical-user-interface (GUI) program that, in turn, generates a display of state transitions in response to events triggered by the user. The GUI program is wrapped around, and can be used to exercise the state-chart behavior of, the generated source code. Once the expected state-chart behavior is confirmed, the generated source code can be augmented with a software interface to the rest of the software with which the source code is required to interact.
Efficient computation of the genomic relationship matrix and other matrices used in single-step evaluation.

PubMed

Aguilar, I; Misztal, I; Legarra, A; Tsuruta, S

2011-12-01

Genomic evaluations can be calculated using a unified procedure that combines phenotypic, pedigree and genomic information. Implementation of such a procedure requires the inverse of the relationship matrix based on pedigree and genomic relationships. The objective of this study was to investigate efficient computing options to create relationship matrices based on genomic markers and pedigree information as well as their inverses. SNP maker information was simulated for a panel of 40 K SNPs, with the number of genotyped animals up to 30 000. Matrix multiplication in the computation of the genomic relationship was by a simple 'do' loop, by two optimized versions of the loop, and by a specific matrix multiplication subroutine. Inversion was by a generalized inverse algorithm and by a LAPACK subroutine. With the most efficient choices and parallel processing, creation of matrices for 30 000 animals would take a few hours. Matrices required to implement a unified approach can be computed efficiently. Optimizations can be either by modifications of existing code or by the use of efficient automatic optimizations provided by open source or third-party libraries. © 2011 Blackwell Verlag GmbH.
Solar Proton Transport within an ICRU Sphere Surrounded by a Complex Shield: Combinatorial Geometry

NASA Technical Reports Server (NTRS)

Wilson, John W.; Slaba, Tony C.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.

2015-01-01

The 3DHZETRN code, with improved neutron and light ion (Z (is) less than 2) transport procedures, was recently developed and compared to Monte Carlo (MC) simulations using simplified spherical geometries. It was shown that 3DHZETRN agrees with the MC codes to the extent they agree with each other. In the present report, the 3DHZETRN code is extended to enable analysis in general combinatorial geometry. A more complex shielding structure with internal parts surrounding a tissue sphere is considered and compared against MC simulations. It is shown that even in the more complex geometry, 3DHZETRN agrees well with the MC codes and maintains a high degree of computational efficiency.
An Efficient Downlink Scheduling Strategy Using Normal Graphs for Multiuser MIMO Wireless Systems

NASA Astrophysics Data System (ADS)

Chen, Jung-Chieh; Wu, Cheng-Hsuan; Lee, Yao-Nan; Wen, Chao-Kai

Inspired by the success of the low-density parity-check (LDPC) codes in the field of error-control coding, in this paper we propose transforming the downlink multiuser multiple-input multiple-output scheduling problem into an LDPC-like problem using the normal graph. Based on the normal graph framework, soft information, which indicates the probability that each user will be scheduled to transmit packets at the access point through a specified angle-frequency sub-channel, is exchanged among the local processors to iteratively optimize the multiuser transmission schedule. Computer simulations show that the proposed algorithm can efficiently schedule simultaneous multiuser transmission which then increases the overall channel utilization and reduces the average packet delay.
Validation of CFD/Heat Transfer Software for Turbine Blade Analysis

NASA Technical Reports Server (NTRS)

Kiefer, Walter D.

2004-01-01

I am an intern in the Turbine Branch of the Turbomachinery and Propulsion Systems Division. The division is primarily concerned with experimental and computational methods of calculating heat transfer effects of turbine blades during operation in jet engines and land-based power systems. These include modeling flow in internal cooling passages and film cooling, as well as calculating heat flux and peak temperatures to ensure safe and efficient operation. The branch is research-oriented, emphasizing the development of tools that may be used by gas turbine designers in industry. The branch has been developing a computational fluid dynamics (CFD) and heat transfer code called GlennHT to achieve the computational end of this analysis. The code was originally written in FORTRAN 77 and run on Silicon Graphics machines. However the code has been rewritten and compiled in FORTRAN 90 to take advantage of more modem computer memory systems. In addition the branch has made a switch in system architectures from SGI's to Linux PC's. The newly modified code therefore needs to be tested and validated. This is the primary goal of my internship. To validate the GlennHT code, it must be run using benchmark fluid mechanics and heat transfer test cases, for which there are either analytical solutions or widely accepted experimental data. From the solutions generated by the code, comparisons can be made to the correct solutions to establish the accuracy of the code. To design and create these test cases, there are many steps and programs that must be used. Before a test case can be run, pre-processing steps must be accomplished. These include generating a grid to describe the geometry, using a software package called GridPro. Also various files required by the GlennHT code must be created including a boundary condition file, a file for multi-processor computing, and a file to describe problem and algorithm parameters. A good deal of this internship will be to become familiar with these programs and the structure of the GlennHT code. Additional information is included in the original extended abstract.
PRECONDITIONED CONJUGATE-GRADIENT 2 (PCG2), a computer program for solving ground-water flow equations

USGS Publications Warehouse

Hill, Mary C.

1990-01-01

This report documents PCG2 : a numerical code to be used with the U.S. Geological Survey modular three-dimensional, finite-difference, ground-water flow model . PCG2 uses the preconditioned conjugate-gradient method to solve the equations produced by the model for hydraulic head. Linear or nonlinear flow conditions may be simulated. PCG2 includes two reconditioning options : modified incomplete Cholesky preconditioning, which is efficient on scalar computers; and polynomial preconditioning, which requires less computer storage and, with modifications that depend on the computer used, is most efficient on vector computers . Convergence of the solver is determined using both head-change and residual criteria. Nonlinear problems are solved using Picard iterations. This documentation provides a description of the preconditioned conjugate gradient method and the two preconditioners, detailed instructions for linking PCG2 to the modular model, sample data inputs, a brief description of PCG2, and a FORTRAN listing.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Massively parallel multicanonical simulations

NASA Astrophysics Data System (ADS)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

NASA Astrophysics Data System (ADS)

Zaghi, S.

2014-07-01

OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.
Development of a GPU Compatible Version of the Fast Radiation Code RRTMG

NASA Astrophysics Data System (ADS)

Iacono, M. J.; Mlawer, E. J.; Berthiaume, D.; Cady-Pereira, K. E.; Suarez, M.; Oreopoulos, L.; Lee, D.

2012-12-01

The absorption of solar radiation and emission/absorption of thermal radiation are crucial components of the physics that drive Earth's climate and weather. Therefore, accurate radiative transfer calculations are necessary for realistic climate and weather simulations. Efficient radiation codes have been developed for this purpose, but their accuracy requirements still necessitate that as much as 30% of the computational time of a GCM is spent computing radiative fluxes and heating rates. The overall computational expense constitutes a limitation on a GCM's predictive ability if it becomes an impediment to adding new physics to or increasing the spatial and/or vertical resolution of the model. The emergence of Graphics Processing Unit (GPU) technology, which will allow the parallel computation of multiple independent radiative calculations in a GCM, will lead to a fundamental change in the competition between accuracy and speed. Processing time previously consumed by radiative transfer will now be available for the modeling of other processes, such as physics parameterizations, without any sacrifice in the accuracy of the radiative transfer. Furthermore, fast radiation calculations can be performed much more frequently and will allow the modeling of radiative effects of rapid changes in the atmosphere. The fast radiation code RRTMG, developed at Atmospheric and Environmental Research (AER), is utilized operationally in many dynamical models throughout the world. We will present the results from the first stage of an effort to create a version of the RRTMG radiation code designed to run efficiently in a GPU environment. This effort will focus on the RRTMG implementation in GEOS-5. RRTMG has an internal pseudo-spectral vector of length of order 100 that, when combined with the much greater length of the global horizontal grid vector from which the radiation code is called in GEOS-5, makes RRTMG/GEOS-5 particularly suited to achieving a significant speed improvement through GPU technology. This large number of independent cases will allow us to take full advantage of the computational power of the latest GPUs, ensuring that all thread cores in the GPU remain active, a key criterion for obtaining significant speedup. The CUDA (Compute Unified Device Architecture) Fortran compiler developed by PGI and Nvidia will allow us to construct this parallel implementation on the GPU while remaining in the Fortran language. This implementation will scale very well across various CUDA-supported GPUs such as the recently released Fermi Nvidia cards. We will present the computational speed improvements of the GPU-compatible code relative to the standard CPU-based RRTMG with respect to a very large and diverse suite of atmospheric profiles. This suite will also be utilized to demonstrate the minimal impact of the code restructuring on the accuracy of radiation calculations. The GPU-compatible version of RRTMG will be directly applicable to future versions of GEOS-5, but it is also likely to provide significant associated benefits for other GCMs that employ RRTMG.
Embedded DCT and wavelet methods for fine granular scalable video: analysis and comparison

NASA Astrophysics Data System (ADS)

van der Schaar-Mitrea, Mihaela; Chen, Yingwei; Radha, Hayder

2000-04-01

Video transmission over bandwidth-varying networks is becoming increasingly important due to emerging applications such as streaming of video over the Internet. The fundamental obstacle in designing such systems resides in the varying characteristics of the Internet (i.e. bandwidth variations and packet-loss patterns). In MPEG-4, a new SNR scalability scheme, called Fine-Granular-Scalability (FGS), is currently under standardization, which is able to adapt in real-time (i.e. at transmission time) to Internet bandwidth variations. The FGS framework consists of a non-scalable motion-predicted base-layer and an intra-coded fine-granular scalable enhancement layer. For example, the base layer can be coded using a DCT-based MPEG-4 compliant, highly efficient video compression scheme. Subsequently, the difference between the original and decoded base-layer is computed, and the resulting FGS-residual signal is intra-frame coded with an embedded scalable coder. In order to achieve high coding efficiency when compressing the FGS enhancement layer, it is crucial to analyze the nature and characteristics of residual signals common to the SNR scalability framework (including FGS). In this paper, we present a thorough analysis of SNR residual signals by evaluating its statistical properties, compaction efficiency and frequency characteristics. The signal analysis revealed that the energy compaction of the DCT and wavelet transforms is limited and the frequency characteristic of SNR residual signals decay rather slowly. Moreover, the blockiness artifacts of the low bit-rate coded base-layer result in artificial high frequencies in the residual signal. Subsequently, a variety of wavelet and embedded DCT coding techniques applicable to the FGS framework are evaluated and their results are interpreted based on the identified signal properties. As expected from the theoretical signal analysis, the rate-distortion performances of the embedded wavelet and DCT-based coders are very similar. However, improved results can be obtained for the wavelet coder by deblocking the base- layer prior to the FGS residual computation. Based on the theoretical analysis and our measurements, we can conclude that for an optimal complexity versus coding-efficiency trade- off, only limited wavelet decomposition (e.g. 2 stages) needs to be performed for the FGS-residual signal. Also, it was observed that the good rate-distortion performance of a coding technique for a certain image type (e.g. natural still-images) does not necessarily translate into similarly good performance for signals with different visual characteristics and statistical properties.
Energy efficient rateless codes for high speed data transfer over free space optical channels

NASA Astrophysics Data System (ADS)

Prakash, Geetha; Kulkarni, Muralidhar; Acharya, U. S.

2015-03-01

Terrestrial Free Space Optical (FSO) links transmit information by using the atmosphere (free space) as a medium. In this paper, we have investigated the use of Luby Transform (LT) codes as a means to mitigate the effects of data corruption induced by imperfect channel which usually takes the form of lost or corrupted packets. LT codes, which are a class of Fountain codes, can be used independent of the channel rate and as many code words as required can be generated to recover all the message bits irrespective of the channel performance. Achieving error free high data rates with limited energy resources is possible with FSO systems if error correction codes with minimal overheads on the power can be used. We also employ a combination of Binary Phase Shift Keying (BPSK) with provision for modification of threshold and optimized LT codes with belief propagation for decoding. These techniques provide additional protection even under strong turbulence regimes. Automatic Repeat Request (ARQ) is another method of improving link reliability. Performance of ARQ is limited by the number of retransmissions and the corresponding time delay. We prove through theoretical computations and simulations that LT codes consume less energy per bit. We validate the feasibility of using energy efficient LT codes over ARQ for FSO links to be used in optical wireless sensor networks within the eye safety limits.
A parallel computing engine for a class of time critical processes.

PubMed

Nabhan, T M; Zomaya, A Y

1997-01-01

This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.
Performance analysis of a parallel Monte Carlo code for simulating solar radiative transfer in cloudy atmospheres using CUDA-enabled NVIDIA GPU

NASA Astrophysics Data System (ADS)

Russkova, Tatiana V.

2017-11-01

One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
ANL/RBC: A computer code for the analysis of Rankine bottoming cycles, including system cost evaluation and off-design performance

NASA Technical Reports Server (NTRS)

Mclennan, G. A.

1986-01-01

This report describes, and is a User's Manual for, a computer code (ANL/RBC) which calculates cycle performance for Rankine bottoming cycles extracting heat from a specified source gas stream. The code calculates cycle power and efficiency and the sizes for the heat exchangers, using tabular input of the properties of the cycle working fluid. An option is provided to calculate the costs of system components from user defined input cost functions. These cost functions may be defined in equation form or by numerical tabular data. A variety of functional forms have been included for these functions and they may be combined to create very general cost functions. An optional calculation mode can be used to determine the off-design performance of a system when operated away from the design-point, using the heat exchanger areas calculated for the design-point.
Collaborative Simulation Grid: Multiscale Quantum-Mechanical/Classical Atomistic Simulations on Distributed PC Clusters in the US and Japan

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya; Iyetomi, Hiroshi; Ogata, Shuji; Kouno, Takahisa; Shimojo, Fuyuki; Tsuruta, Kanji; Saini, Subhash;

2002-01-01

A multidisciplinary, collaborative simulation has been performed on a Grid of geographically distributed PC clusters. The multiscale simulation approach seamlessly combines i) atomistic simulation backed on the molecular dynamics (MD) method and ii) quantum mechanical (QM) calculation based on the density functional theory (DFT), so that accurate but less scalable computations are performed only where they are needed. The multiscale MD/QM simulation code has been Grid-enabled using i) a modular, additive hybridization scheme, ii) multiple QM clustering, and iii) computation/communication overlapping. The Gridified MD/QM simulation code has been used to study environmental effects of water molecules on fracture in silicon. A preliminary run of the code has achieved a parallel efficiency of 94% on 25 PCs distributed over 3 PC clusters in the US and Japan, and a larger test involving 154 processors on 5 distributed PC clusters is in progress.

Advanced Applications of Adifor 3.0 for Efficient Calculation of First-and Second-Order CFD Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Taylor, Arthur C., III

2004-01-01

This final report will document the accomplishments of the work of this project. 1) The incremental-iterative (II) form of the reverse-mode (adjoint) method for computing first-order (FO) aerodynamic sensitivity derivatives (SDs) has been successfully implemented and tested in a 2D CFD code (called ANSERS) using the reverse-mode capability of ADIFOR 3.0. These preceding results compared very well with similar SDS computed via a black-box (BB) application of the reverse-mode capability of ADIFOR 3.0, and also with similar SDs calculated via the method of finite differences. 2) Second-order (SO) SDs have been implemented in the 2D ASNWERS code using the very efficient strategy that was originally proposed (but not previously tested) of Reference 3, Appendix A. Furthermore, these SO SOs have been validated for accuracy and computational efficiency. 3) Studies were conducted in Quasi-1D and 2D concerning the smoothness (or lack of smoothness) of the FO and SO SD's for flows with shock waves. The phenomenon is documented in the publications of this study (listed subsequently), however, the specific numerical mechanism which is responsible for this unsmoothness phenomenon was not discovered. 4) The FO and SO derivatives for Quasi-1D and 2D flows were applied to predict aerodynamic design uncertainties, and were also applied in robust design optimization studies.
A finite element solver for 3-D compressible viscous flows

NASA Technical Reports Server (NTRS)

Reddy, K. C.; Reddy, J. N.; Nayani, S.

1990-01-01

Computation of the flow field inside a space shuttle main engine (SSME) requires the application of state of the art computational fluid dynamic (CFD) technology. Several computer codes are under development to solve 3-D flow through the hot gas manifold. Some algorithms were designed to solve the unsteady compressible Navier-Stokes equations, either by implicit or explicit factorization methods, using several hundred or thousands of time steps to reach a steady state solution. A new iterative algorithm is being developed for the solution of the implicit finite element equations without assembling global matrices. It is an efficient iteration scheme based on a modified nonlinear Gauss-Seidel iteration with symmetric sweeps. The algorithm is analyzed for a model equation and is shown to be unconditionally stable. Results from a series of test problems are presented. The finite element code was tested for couette flow, which is flow under a pressure gradient between two parallel plates in relative motion. Another problem that was solved is viscous laminar flow over a flat plate. The general 3-D finite element code was used to compute the flow in an axisymmetric turnaround duct at low Mach numbers.
Robust Nonlinear Neural Codes

NASA Astrophysics Data System (ADS)

Yang, Qianli; Pitkow, Xaq

2015-03-01

Most interesting natural sensory stimuli are encoded in the brain in a form that can only be decoded nonlinearly. But despite being a core function of the brain, nonlinear population codes are rarely studied and poorly understood. Interestingly, the few existing models of nonlinear codes are inconsistent with known architectural features of the brain. In particular, these codes have information content that scales with the size of the cortical population, even if that violates the data processing inequality by exceeding the amount of information entering the sensory system. Here we provide a valid theory of nonlinear population codes by generalizing recent work on information-limiting correlations in linear population codes. Although these generalized, nonlinear information-limiting correlations bound the performance of any decoder, they also make decoding more robust to suboptimal computation, allowing many suboptimal decoders to achieve nearly the same efficiency as an optimal decoder. Although these correlations are extremely difficult to measure directly, particularly for nonlinear codes, we provide a simple, practical test by which one can use choice-related activity in small populations of neurons to determine whether decoding is suboptimal or optimal and limited by correlated noise. We conclude by describing an example computation in the vestibular system where this theory applies. QY and XP was supported by a grant from the McNair foundation.
A CFD Study on the Prediction of Cyclone Collection Efficiency

NASA Astrophysics Data System (ADS)

Gimbun, Jolius; Chuah, T. G.; Choong, Thomas S. Y.; Fakhru'L-Razi, A.

2005-09-01

This work presents a Computational Fluid Dynamics calculation to predict and to evaluate the effects of temperature, operating pressure and inlet velocity on the collection efficiency of gas cyclones. The numerical solutions were carried out using spreadsheet and commercial CFD code FLUENT 6.0. This paper also reviews four empirical models for the prediction of cyclone collection efficiency, namely Lapple [1], Koch and Licht [2], Li and Wang [3], and Iozia and Leith [4]. All the predictions proved to be satisfactory when compared with the presented experimental data. The CFD simulations predict the cyclone cut-off size for all operating conditions with a deviation of 3.7% from the experimental data. Specifically, results obtained from the computer modelling exercise have demonstrated that CFD model is the best method of modelling the cyclones collection efficiency.
A Study of Neutron Leakage in Finite Objects

NASA Technical Reports Server (NTRS)

Wilson, John W.; Slaba, Tony C.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.

2015-01-01

A computationally efficient 3DHZETRN code capable of simulating High charge (Z) and Energy (HZE) and light ions (including neutrons) under space-like boundary conditions with enhanced neutron and light ion propagation was recently developed for simple shielded objects. Monte Carlo (MC) benchmarks were used to verify the 3DHZETRN methodology in slab and spherical geometry, and it was shown that 3DHZETRN agrees with MC codes to the degree that various MC codes agree among themselves. One limitation in the verification process is that all of the codes (3DHZETRN and three MC codes) utilize different nuclear models/databases. In the present report, the new algorithm, with well-defined convergence criteria, is used to quantify the neutron leakage from simple geometries to provide means of verifying 3D effects and to provide guidance for further code development.
Integrated Composite Analyzer (ICAN): Users and programmers manual

NASA Technical Reports Server (NTRS)

Murthy, P. L. N.; Chamis, C. C.

1986-01-01

The use of and relevant equations programmed in a computer code designed to carry out a comprehensive linear analysis of multilayered fiber composites is described. The analysis contains the essential features required to effectively design structural components made from fiber composites. The inputs to the code are constituent material properties, factors reflecting the fabrication process, and composite geometry. The code performs micromechanics, macromechanics, and laminate analysis, including the hygrothermal response of fiber composites. The code outputs are the various ply and composite properties, composite structural response, and composite stress analysis results with details on failure. The code is in Fortran IV and can be used efficiently as a package in complex structural analysis programs. The input-output format is described extensively through the use of a sample problem. The program listing is also included. The code manual consists of two parts.
Compression of computer generated phase-shifting hologram sequence using AVC and HEVC

NASA Astrophysics Data System (ADS)

Xing, Yafei; Pesquet-Popescu, Béatrice; Dufaux, Frederic

2013-09-01

With the capability of achieving twice the compression ratio of Advanced Video Coding (AVC) with similar reconstruction quality, High Efficiency Video Coding (HEVC) is expected to become the newleading technique of video coding. In order to reduce the storage and transmission burden of digital holograms, in this paper we propose to use HEVC for compressing the phase-shifting digital hologram sequences (PSDHS). By simulating phase-shifting digital holography (PSDH) interferometry, interference patterns between illuminated three dimensional( 3D) virtual objects and the stepwise phase changed reference wave are generated as digital holograms. The hologram sequences are obtained by the movement of the virtual objects and compressed by AVC and HEVC. The experimental results show that AVC and HEVC are efficient to compress PSDHS, with HEVC giving better performance. Good compression rate and reconstruction quality can be obtained with bitrate above 15000kbps.
A Simple Secure Hash Function Scheme Using Multiple Chaotic Maps

NASA Astrophysics Data System (ADS)

Ahmad, Musheer; Khurana, Shruti; Singh, Sushmita; AlSharari, Hamed D.

2017-06-01

The chaotic maps posses high parameter sensitivity, random-like behavior and one-way computations, which favor the construction of cryptographic hash functions. In this paper, we propose to present a novel hash function scheme which uses multiple chaotic maps to generate efficient variable-sized hash functions. The message is divided into four parts, each part is processed by a different 1D chaotic map unit yielding intermediate hash code. The four codes are concatenated to two blocks, then each block is processed through 2D chaotic map unit separately. The final hash value is generated by combining the two partial hash codes. The simulation analyses such as distribution of hashes, statistical properties of confusion and diffusion, message and key sensitivity, collision resistance and flexibility are performed. The results reveal that the proposed anticipated hash scheme is simple, efficient and holds comparable capabilities when compared with some recent chaos-based hash algorithms.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
MATIN: A Random Network Coding Based Framework for High Quality Peer-to-Peer Live Video Streaming

PubMed Central

Barekatain, Behrang; Khezrimotlagh, Dariush; Aizaini Maarof, Mohd; Ghaeini, Hamid Reza; Salleh, Shaharuddin; Quintana, Alfonso Ariza; Akbari, Behzad; Cabrera, Alicia Triviño

2013-01-01

In recent years, Random Network Coding (RNC) has emerged as a promising solution for efficient Peer-to-Peer (P2P) video multicasting over the Internet. This probably refers to this fact that RNC noticeably increases the error resiliency and throughput of the network. However, high transmission overhead arising from sending large coefficients vector as header has been the most important challenge of the RNC. Moreover, due to employing the Gauss-Jordan elimination method, considerable computational complexity can be imposed on peers in decoding the encoded blocks and checking linear dependency among the coefficients vectors. In order to address these challenges, this study introduces MATIN which is a random network coding based framework for efficient P2P video streaming. The MATIN includes a novel coefficients matrix generation method so that there is no linear dependency in the generated coefficients matrix. Using the proposed framework, each peer encapsulates one instead of n coefficients entries into the generated encoded packet which results in very low transmission overhead. It is also possible to obtain the inverted coefficients matrix using a bit number of simple arithmetic operations. In this regard, peers sustain very low computational complexities. As a result, the MATIN permits random network coding to be more efficient in P2P video streaming systems. The results obtained from simulation using OMNET++ show that it substantially outperforms the RNC which uses the Gauss-Jordan elimination method by providing better video quality on peers in terms of the four important performance metrics including video distortion, dependency distortion, End-to-End delay and Initial Startup delay. PMID:23940530
Searching for Physics Beyond the Standard Model: Strongly-Coupled Field Theories at the Intensity and Energy Frontiers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brower, Richard C.

This proposal is to develop the software and algorithmic infrastructure needed for the numerical study of quantum chromodynamics (QCD), and of theories that have been proposed to describe physics beyond the Standard Model (BSM) of high energy physics, on current and future computers. This infrastructure will enable users (1) to improve the accuracy of QCD calculations to the point where they no longer limit what can be learned from high-precision experiments that seek to test the Standard Model, and (2) to determine the predictions of BSM theories in order to understand which of them are consistent with the data thatmore » will soon be available from the LHC. Work will include the extension and optimizations of community codes for the next generation of leadership class computers, the IBM Blue Gene/Q and the Cray XE/XK, and for the dedicated hardware funded for our field by the Department of Energy. Members of our collaboration at Brookhaven National Laboratory and Columbia University worked on the design of the Blue Gene/Q, and have begun to develop software for it. Under this grant we will build upon their experience to produce high-efficiency production codes for this machine. Cray XE/XK computers with many thousands of GPU accelerators will soon be available, and the dedicated commodity clusters we obtain with DOE funding include growing numbers of GPUs. We will work with our partners in NVIDIA's Emerging Technology group to scale our existing software to thousands of GPUs, and to produce highly efficient production codes for these machines. Work under this grant will also include the development of new algorithms for the effective use of heterogeneous computers, and their integration into our codes. It will include improvements of Krylov solvers and the development of new multigrid methods in collaboration with members of the FASTMath SciDAC Institute, using their HYPRE framework, as well as work on improved symplectic integrators.« less
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.

Agile Multi-Scale Decompositions for Automatic Image Registration

NASA Technical Reports Server (NTRS)

Murphy, James M.; Leija, Omar Navarro; Le Moigne, Jacqueline

2016-01-01

In recent works, the first and third authors developed an automatic image registration algorithm based on a multiscale hybrid image decomposition with anisotropic shearlets and isotropic wavelets. This prototype showed strong performance, improving robustness over registration with wavelets alone. However, this method imposed a strict hierarchy on the order in which shearlet and wavelet features were used in the registration process, and also involved an unintegrated mixture of MATLAB and C code. In this paper, we introduce a more agile model for generating features, in which a flexible and user-guided mix of shearlet and wavelet features are computed. Compared to the previous prototype, this method introduces a flexibility to the order in which shearlet and wavelet features are used in the registration process. Moreover, the present algorithm is now fully coded in C, making it more efficient and portable than the MATLAB and C prototype. We demonstrate the versatility and computational efficiency of this approach by performing registration experiments with the fully-integrated C algorithm. In particular, meaningful timing studies can now be performed, to give a concrete analysis of the computational costs of the flexible feature extraction. Examples of synthetically warped and real multi-modal images are analyzed.
An implicit higher-order spatially accurate scheme for solving time dependent flows on unstructured meshes

NASA Astrophysics Data System (ADS)

Tomaro, Robert F.

1998-07-01

The present research is aimed at developing a higher-order, spatially accurate scheme for both steady and unsteady flow simulations using unstructured meshes. The resulting scheme must work on a variety of general problems to ensure the creation of a flexible, reliable and accurate aerodynamic analysis tool. To calculate the flow around complex configurations, unstructured grids and the associated flow solvers have been developed. Efficient simulations require the minimum use of computer memory and computational times. Unstructured flow solvers typically require more computer memory than a structured flow solver due to the indirect addressing of the cells. The approach taken in the present research was to modify an existing three-dimensional unstructured flow solver to first decrease the computational time required for a solution and then to increase the spatial accuracy. The terms required to simulate flow involving non-stationary grids were also implemented. First, an implicit solution algorithm was implemented to replace the existing explicit procedure. Several test cases, including internal and external, inviscid and viscous, two-dimensional, three-dimensional and axi-symmetric problems, were simulated for comparison between the explicit and implicit solution procedures. The increased efficiency and robustness of modified code due to the implicit algorithm was demonstrated. Two unsteady test cases, a plunging airfoil and a wing undergoing bending and torsion, were simulated using the implicit algorithm modified to include the terms required for a moving and/or deforming grid. Secondly, a higher than second-order spatially accurate scheme was developed and implemented into the baseline code. Third- and fourth-order spatially accurate schemes were implemented and tested. The original dissipation was modified to include higher-order terms and modified near shock waves to limit pre- and post-shock oscillations. The unsteady cases were repeated using the higher-order spatially accurate code. The new solutions were compared with those obtained using the second-order spatially accurate scheme. Finally, the increased efficiency of using an implicit solution algorithm in a production Computational Fluid Dynamics flow solver was demonstrated for steady and unsteady flows. A third- and fourth-order spatially accurate scheme has been implemented creating a basis for a state-of-the-art aerodynamic analysis tool.
Photovoltaic conversion of laser power to electrical power

NASA Technical Reports Server (NTRS)

Walker, G. H.; Heinbockel, J. H.

1986-01-01

Photovoltaic laser to electric converters are attractive for use with a space-based laser power station. This paper presents the results of modeling studies for a silicon vertical junction converter used with a Nd laser. A computer code was developed for the model and this code was used to conduct a parametric study for a Si vertical junction converter consisting of one p-n junction irradiated with a Nd laser. These calculations predict an efficiency over 50 percent for an optimized converter.
YAP Version 4.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, Eric M.

2004-05-20

The YAP software library computes (1) electromagnetic modes, (2) electrostatic fields, (3) magnetostatic fields and (4) particle trajectories in 2d and 3d models. The code employs finite element methods on unstructured grids of tetrahedral, hexahedral, prism and pyramid elements, with linear through cubic element shapes and basis functions to provide high accuracy. The novel particle tracker is robust, accurate and efficient, even on unstructured grids with discontinuous fields. This software library is a component of the MICHELLE 3d finite element gun code.
Recent Improvements in the FDNS CFD Code and its Associated Process

NASA Technical Reports Server (NTRS)

West, Jeff S.; Dorney, Suzanne M.; Turner, Jim (Technical Monitor)

2002-01-01

This viewgraph presentation gives an overview on recent improvements in the Finite Difference Navier Stokes (FDNS) computational fluid dynamics (CFD) code and its associated process. The development of a utility, PreViewer, has essentially eliminated the creeping of simple human error into the FDNS Solution process. Extension of PreViewer to encapsulate the Domain Decompression process has made practical the routine use of parallel processing. The combination of CVS source control and ATS consistency validation significantly increases the efficiency of the CFD process.
Final Report for "Implimentation and Evaluation of Multigrid Linear Solvers into Extended Magnetohydrodynamic Codes for Petascale Computing"

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srinath Vadlamani; Scott Kruger; Travis Austin

Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems.more » For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.« less
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
Extensible Computational Chemistry Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-08-09

ECCE provides a sophisticated graphical user interface, scientific visualization tools, and the underlying data management framework enabling scientists to efficiently set up calculations and store, retrieve, and analyze the rapidly growing volumes of data produced by computational chemistry studies. ECCE was conceived as part of the Environmental Molecular Sciences Laboratory construction to solve the problem of researchers being able to effectively utilize complex computational chemistry codes and massively parallel high performance compute resources. Bringing the power of these codes and resources to the desktops of researcher and thus enabling world class research without users needing a detailed understanding of themore » inner workings of either the theoretical codes or the supercomputers needed to run them was a grand challenge problem in the original version of the EMSL. ECCE allows collaboration among researchers using a web-based data repository where the inputs and results for all calculations done within ECCE are organized. ECCE is a first of kind end-to-end problem solving environment for all phases of computational chemistry research: setting up calculations with sophisticated GUI and direct manipulation visualization tools, submitting and monitoring calculations on remote high performance supercomputers without having to be familiar with the details of using these compute resources, and performing results visualization and analysis including creating publication quality images. ECCE is a suite of tightly integrated applications that are employed as the user moves through the modeling process.« less
Advanced Doubling Adding Method for Radiative Transfer in Planetary Atmospheres

NASA Astrophysics Data System (ADS)

Liu, Quanhua; Weng, Fuzhong

2006-12-01

The doubling adding method (DA) is one of the most accurate tools for detailed multiple-scattering calculations. The principle of the method goes back to the nineteenth century in a problem dealing with reflection and transmission by glass plates. Since then the doubling adding method has been widely used as a reference tool for other radiative transfer models. The method has never been used in operational applications owing to tremendous demand on computational resources from the model. This study derives an analytical expression replacing the most complicated thermal source terms in the doubling adding method. The new development is called the advanced doubling adding (ADA) method. Thanks also to the efficiency of matrix and vector manipulations in FORTRAN 90/95, the advanced doubling adding method is about 60 times faster than the doubling adding method. The radiance (i.e., forward) computation code of ADA is easily translated into tangent linear and adjoint codes for radiance gradient calculations. The simplicity in forward and Jacobian computation codes is very useful for operational applications and for the consistency between the forward and adjoint calculations in satellite data assimilation.
Creating a Simple Single Computational Approach to Modeling Rarefied and Continuum Flow About Aerospace Vehicles

NASA Technical Reports Server (NTRS)

Goldstein, David B.; Varghese, Philip L.

1997-01-01

We proposed to create a single computational code incorporating methods that can model both rarefied and continuum flow to enable the efficient simulation of flow about space craft and high altitude hypersonic aerospace vehicles. The code was to use a single grid structure that permits a smooth transition between the continuum and rarefied portions of the flow. Developing an appropriate computational boundary between the two regions represented a major challenge. The primary approach chosen involves coupling a four-speed Lattice Boltzmann model for the continuum flow with the DSMC method in the rarefied regime. We also explored the possibility of using a standard finite difference Navier Stokes solver for the continuum flow. With the resulting code we will ultimately investigate three-dimensional plume impingement effects, a subject of critical importance to NASA and related to the work of Drs. Forrest Lumpkin, Steve Fitzgerald and Jay Le Beau at Johnson Space Center. Below is a brief background on the project and a summary of the results as of the end of the grant.
Validation of a computer code for analysis of subsonic aerodynamic performance of wings with flaps in combination with a canard or horizontal tail and an application to optimization

NASA Technical Reports Server (NTRS)

Carlson, Harry W.; Darden, Christine M.; Mann, Michael J.

1990-01-01

Extensive correlations of computer code results with experimental data are employed to illustrate the use of a linearized theory, attached flow method for the estimation and optimization of the longitudinal aerodynamic performance of wing-canard and wing-horizontal tail configurations which may employ simple hinged flap systems. Use of an attached flow method is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. The results indicate that linearized theory, attached flow, computer code methods (modified to include estimated attainable leading-edge thrust and an approximate representation of vortex forces) provide a rational basis for the estimation and optimization of aerodynamic performance at subsonic speeds below the drag rise Mach number. Generally, good prediction of aerodynamic performance, as measured by the suction parameter, can be expected for near optimum combinations of canard or horizontal tail incidence and leading- and trailing-edge flap deflections at a given lift coefficient (conditions which tend to produce a predominantly attached flow).
A discrete Fourier transform for virtual memory machines

NASA Technical Reports Server (NTRS)

Galant, David C.

1992-01-01

An algebraic theory of the Discrete Fourier Transform is developed in great detail. Examination of the details of the theory leads to a computationally efficient fast Fourier transform for the use on computers with virtual memory. Such an algorithm is of great use on modern desktop machines. A FORTRAN coded version of the algorithm is given for the case when the sequence of numbers to be transformed is a power of two.
A Lossless Multichannel Bio-Signal Compression Based on Low-Complexity Joint Coding Scheme for Portable Medical Devices

PubMed Central

Kim, Dong-Sun; Kwon, Jin-San

2014-01-01

Research on real-time health systems have received great attention during recent years and the needs of high-quality personal multichannel medical signal compression for personal medical product applications are increasing. The international MPEG-4 audio lossless coding (ALS) standard supports a joint channel-coding scheme for improving compression performance of multichannel signals and it is very efficient compression method for multi-channel biosignals. However, the computational complexity of such a multichannel coding scheme is significantly greater than that of other lossless audio encoders. In this paper, we present a multichannel hardware encoder based on a low-complexity joint-coding technique and shared multiplier scheme for portable devices. A joint-coding decision method and a reference channel selection scheme are modified for a low-complexity joint coder. The proposed joint coding decision method determines the optimized joint-coding operation based on the relationship between the cross correlation of residual signals and the compression ratio. The reference channel selection is designed to select a channel for the entropy coding of the joint coding. The hardware encoder operates at a 40 MHz clock frequency and supports two-channel parallel encoding for the multichannel monitoring system. Experimental results show that the compression ratio increases by 0.06%, whereas the computational complexity decreases by 20.72% compared to the MPEG-4 ALS reference software encoder. In addition, the compression ratio increases by about 11.92%, compared to the single channel based bio-signal lossless data compressor. PMID:25237900
Learning-Based Just-Noticeable-Quantization- Distortion Modeling for Perceptual Video Coding.

PubMed

Ki, Sehwan; Bae, Sung-Ho; Kim, Munchurl; Ko, Hyunsuk

2018-07-01

Conventional predictive video coding-based approaches are reaching the limit of their potential coding efficiency improvements, because of severely increasing computation complexity. As an alternative approach, perceptual video coding (PVC) has attempted to achieve high coding efficiency by eliminating perceptual redundancy, using just-noticeable-distortion (JND) directed PVC. The previous JNDs were modeled by adding white Gaussian noise or specific signal patterns into the original images, which were not appropriate in finding JND thresholds due to distortion with energy reduction. In this paper, we present a novel discrete cosine transform-based energy-reduced JND model, called ERJND, that is more suitable for JND-based PVC schemes. Then, the proposed ERJND model is extended to two learning-based just-noticeable-quantization-distortion (JNQD) models as preprocessing that can be applied for perceptual video coding. The two JNQD models can automatically adjust JND levels based on given quantization step sizes. One of the two JNQD models, called LR-JNQD, is based on linear regression and determines the model parameter for JNQD based on extracted handcraft features. The other JNQD model is based on a convolution neural network (CNN), called CNN-JNQD. To our best knowledge, our paper is the first approach to automatically adjust JND levels according to quantization step sizes for preprocessing the input to video encoders. In experiments, both the LR-JNQD and CNN-JNQD models were applied to high efficiency video coding (HEVC) and yielded maximum (average) bitrate reductions of 38.51% (10.38%) and 67.88% (24.91%), respectively, with little subjective video quality degradation, compared with the input without preprocessing applied.
Intra Frame Coding In Advanced Video Coding Standard (H.264) to Obtain Consistent PSNR and Reduce Bit Rate for Diagonal Down Left Mode Using Gaussian Pulse

NASA Astrophysics Data System (ADS)

Manjanaik, N.; Parameshachari, B. D.; Hanumanthappa, S. N.; Banu, Reshma

2017-08-01

Intra prediction process of H.264 video coding standard used to code first frame i.e. Intra frame of video to obtain good coding efficiency compare to previous video coding standard series. More benefit of intra frame coding is to reduce spatial pixel redundancy with in current frame, reduces computational complexity and provides better rate distortion performance. To code Intra frame it use existing process Rate Distortion Optimization (RDO) method. This method increases computational complexity, increases in bit rate and reduces picture quality so it is difficult to implement in real time applications, so the many researcher has been developed fast mode decision algorithm for coding of intra frame. The previous work carried on Intra frame coding in H.264 standard using fast decision mode intra prediction algorithm based on different techniques was achieved increased in bit rate, degradation of picture quality(PSNR) for different quantization parameters. Many previous approaches of fast mode decision algorithms on intra frame coding achieved only reduction of computational complexity or it save encoding time and limitation was increase in bit rate with loss of quality of picture. In order to avoid increase in bit rate and loss of picture quality a better approach was developed. In this paper developed a better approach i.e. Gaussian pulse for Intra frame coding using diagonal down left intra prediction mode to achieve higher coding efficiency in terms of PSNR and bitrate. In proposed method Gaussian pulse is multiplied with each 4x4 frequency domain coefficients of 4x4 sub macro block of macro block of current frame before quantization process. Multiplication of Gaussian pulse for each 4x4 integer transformed coefficients at macro block levels scales the information of the coefficients in a reversible manner. The resulting signal would turn abstract. Frequency samples are abstract in a known and controllable manner without intermixing of coefficients, it avoids picture getting bad hit for higher values of quantization parameters. The proposed work was implemented using MATLAB and JM 18.6 reference software. The proposed work measure the performance parameters PSNR, bit rate and compression of intra frame of yuv video sequences in QCIF resolution under different values of quantization parameter with Gaussian value for diagonal down left intra prediction mode. The simulation results of proposed algorithm are tabulated and compared with previous algorithm i.e. Tian et al method. The proposed algorithm achieved reduced in bit rate averagely 30.98% and maintain consistent picture quality for QCIF sequences compared to previous algorithm i.e. Tian et al method.
PETSc Users Manual Revision 3.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Brown, J.; Buschelman, K.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Brown, J.; Buschelman, K.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Abhyankar, S.; Adams, M.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself. ;For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
An efficient CU partition algorithm for HEVC based on improved Sobel operator

NASA Astrophysics Data System (ADS)

Sun, Xuebin; Chen, Xiaodong; Xu, Yong; Sun, Gang; Yang, Yunsheng

2018-04-01

As the latest video coding standard, High Efficiency Video Coding (HEVC) achieves over 50% bit rate reduction with similar video quality compared with previous standards H.264/AVC. However, the higher compression efficiency is attained at the cost of significantly increasing computational load. In order to reduce the complexity, this paper proposes a fast coding unit (CU) partition technique to speed up the process. To detect the edge features of each CU, a more accurate improved Sobel filtering is developed and performed By analyzing the textural features of CU, an early CU splitting termination is proposed to decide whether a CU should be decomposed into four lower-dimensions CUs or not. Compared with the reference software HM16.7, experimental results indicate the proposed algorithm can lessen the encoding time up to 44.09% on average, with a negligible bit rate increase of 0.24%, and quality losses lower 0.03 dB, respectively. In addition, the proposed algorithm gets a better trade-off between complexity and rate-distortion among the other proposed works.
Software for Brain Network Simulations: A Comparative Study

PubMed Central

Tikidji-Hamburyan, Ruben A.; Narayana, Vikram; Bozkus, Zeki; El-Ghazawi, Tarek A.

2017-01-01

Numerical simulations of brain networks are a critical part of our efforts in understanding brain functions under pathological and normal conditions. For several decades, the community has developed many software packages and simulators to accelerate research in computational neuroscience. In this article, we select the three most popular simulators, as determined by the number of models in the ModelDB database, such as NEURON, GENESIS, and BRIAN, and perform an independent evaluation of these simulators. In addition, we study NEST, one of the lead simulators of the Human Brain Project. First, we study them based on one of the most important characteristics, the range of supported models. Our investigation reveals that brain network simulators may be biased toward supporting a specific set of models. However, all simulators tend to expand the supported range of models by providing a universal environment for the computational study of individual neurons and brain networks. Next, our investigations on the characteristics of computational architecture and efficiency indicate that all simulators compile the most computationally intensive procedures into binary code, with the aim of maximizing their computational performance. However, not all simulators provide the simplest method for module development and/or guarantee efficient binary code. Third, a study of their amenability for high-performance computing reveals that NEST can almost transparently map an existing model on a cluster or multicore computer, while NEURON requires code modification if the model developed for a single computer has to be mapped on a computational cluster. Interestingly, parallelization is the weakest characteristic of BRIAN, which provides no support for cluster computations and limited support for multicore computers. Fourth, we identify the level of user support and frequency of usage for all simulators. Finally, we carry out an evaluation using two case studies: a large network with simplified neural and synaptic models and a small network with detailed models. These two case studies allow us to avoid any bias toward a particular software package. The results indicate that BRIAN provides the most concise language for both cases considered. Furthermore, as expected, NEST mostly favors large network models, while NEURON is better suited for detailed models. Overall, the case studies reinforce our general observation that simulators have a bias in the computational performance toward specific types of the brain network models. PMID:28775687

Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

PubMed

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

PubMed Central

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-01-01

Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability.

PubMed

Santos, José; Monteagudo, Ángel

2017-03-27

The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutation adverse effect as the main selection factor in the evolution of the code. We have used simulated evolution in a computer to search for optimized codes, which helps to obtain information about the optimization level of the canonical code in its evolution. A genetic algorithm searches for efficient codes in a fitness landscape that corresponds with the adaptability of possible hypothetical genetic codes. The lower the effects of errors or mutations in the codon bases of a hypothetical code, the more efficient or optimal is that code. The inclusion of the fitness sharing technique in the evolutionary algorithm allows the extent to which the canonical genetic code is in an area corresponding to a deep local minimum to be easily determined, even in the high dimensional spaces considered. The analyses show that the canonical code is not in a deep local minimum and that the fitness landscape is not a multimodal fitness landscape with deep and separated peaks. Moreover, the canonical code is clearly far away from the areas of higher fitness in the landscape. Given the non-presence of deep local minima in the landscape, although the code could evolve and different forces could shape its structure, the fitness landscape nature considered in the error minimization theory does not explain why the canonical code ended its evolution in a location which is not an area of a localized deep minimum of the huge fitness landscape.
FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data—Part 1: Algorithm

NASA Astrophysics Data System (ADS)

Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves

2009-03-01

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Maxis-A rezoning and remapping code in two dimensional cylindrical geometry

NASA Astrophysics Data System (ADS)

Lin, Zhiwei; Jiang, Shaoen; Zhang, Lu; Kuang, Longyu; Li, Hang

2018-06-01

This paper presents the new version of our code Maxis (Lin et al., 2011). Maxis is a local rezoning and remapping code in two dimensional cylindrical geometry, which can be employed to address the grid distortion problem of unstructured meshes. The new version of Maxis is mostly programmed in the C language which considerably improves its computational efficiency with respect to the former Matlab version. A new algorithm for determining the intersection of two arbitrary convex polygons is also incorporated into the new version. Some additional linking functions are further provided in the new version for the purpose of combining Maxis and MULTI2D.
Chemical reactivity and spectroscopy explored from QM/MM molecular dynamics simulations using the LIO code

NASA Astrophysics Data System (ADS)

Marcolongo, Juan P.; Zeida, Ari; Semelak, Jonathan A.; Foglia, Nicolás O.; Morzan, Uriel N.; Estrin, Dario A.; González Lebrero, Mariano C.; Scherlis, Damián A.

2018-03-01

In this work we present the current advances in the development and the applications of LIO, a lab-made code designed for density functional theory calculations in graphical processing units (GPU), that can be coupled with different classical molecular dynamics engines. This code has been thoroughly optimized to perform efficient molecular dynamics simulations at the QM/MM DFT level, allowing for an exhaustive sampling of the configurational space. Selected examples are presented for the description of chemical reactivity in terms of free energy profiles, and also for the computation of optical properties, such as vibrational and electronic spectra in solvent and protein environments.
Collaborative Software Development in Support of Fast Adaptive AeroSpace Tools (FAAST)

NASA Technical Reports Server (NTRS)

Kleb, William L.; Nielsen, Eric J.; Gnoffo, Peter A.; Park, Michael A.; Wood, William A.

2003-01-01

A collaborative software development approach is described. The software product is an adaptation of proven computational capabilities combined with new capabilities to form the Agency's next generation aerothermodynamic and aerodynamic analysis and design tools. To efficiently produce a cohesive, robust, and extensible software suite, the approach uses agile software development techniques; specifically, project retrospectives, the Scrum status meeting format, and a subset of Extreme Programming's coding practices are employed. Examples are provided which demonstrate the substantial benefits derived from employing these practices. Also included is a discussion of issues encountered when porting legacy Fortran 77 code to Fortran 95 and a Fortran 95 coding standard.
Network Coding Opportunities for Wireless Grids Formed by Mobile Devices

NASA Astrophysics Data System (ADS)

Nielsen, Karsten Fyhn; Madsen, Tatiana K.; Fitzek, Frank H. P.

Wireless grids have potential in sharing communication, computa-tional and storage resources making these networks more powerful, more robust, and less cost intensive. However, to enjoy the benefits of cooperative resource sharing, a number of issues should be addressed and the cost of the wireless link should be taken into account. We focus on the question how nodes can efficiently communicate and distribute data in a wireless grid. We show the potential of a network coding approach when nodes have the possibility to combine packets thus increasing the amount of information per transmission. Our implementation demonstrates the feasibility of network coding for wireless grids formed by mobile devices.
BALANCING THE LOAD: A VORONOI BASED SCHEME FOR PARALLEL COMPUTATIONS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steinberg, Elad; Yalinewich, Almog; Sari, Re'em

2015-01-01

One of the key issues when running a simulation on multiple CPUs is maintaining a proper load balance throughout the run and minimizing communications between CPUs. We propose a novel method of utilizing a Voronoi diagram to achieve a nearly perfect load balance without the need of any global redistributions of data. As a show case, we implement our method in RICH, a two-dimensional moving mesh hydrodynamical code, but it can be extended trivially to other codes in two or three dimensions. Our tests show that this method is indeed efficient and can be used in a large variety ofmore » existing hydrodynamical codes.« less
Correlated Errors in the Surface Code

NASA Astrophysics Data System (ADS)

Lopez, Daniel; Mucciolo, E. R.; Novais, E.

2012-02-01

A milestone step into the development of quantum information technology would be the ability to design and operate a reliable quantum memory. The greatest obstacle to create such a device has been decoherence due to the unavoidable interaction between the quantum system and its environment. Quantum Error Correction is therefore an essential ingredient to any quantum computing information device. A great deal of attention has been given to surface codes, since it has very good scaling properties. In this seminar, we discuss the time evolution of a qubit encoded in the logical basis of a surface code. The system is interacting with a bosonic environment at zero temperature. Our results show how much spatial and time correlations can be detrimental to the efficiency of the code.
Solar proton exposure of an ICRU sphere within a complex structure Part I: Combinatorial geometry.

PubMed

Wilson, John W; Slaba, Tony C; Badavi, Francis F; Reddell, Brandon D; Bahadori, Amir A

2016-06-01

The 3DHZETRN code, with improved neutron and light ion (Z≤2) transport procedures, was recently developed and compared to Monte Carlo (MC) simulations using simplified spherical geometries. It was shown that 3DHZETRN agrees with the MC codes to the extent they agree with each other. In the present report, the 3DHZETRN code is extended to enable analysis in general combinatorial geometry. A more complex shielding structure with internal parts surrounding a tissue sphere is considered and compared against MC simulations. It is shown that even in the more complex geometry, 3DHZETRN agrees well with the MC codes and maintains a high degree of computational efficiency. Published by Elsevier Ltd.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

NASA Astrophysics Data System (ADS)

Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.

2017-02-01

We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Translating an AI application from Lisp to Ada: A case study

NASA Technical Reports Server (NTRS)

Davis, Gloria J.

1991-01-01

A set of benchmarks was developed to test the performance of a newly designed computer executing both Lisp and Ada. Among these was AutoClassII -- a large Artificial Intelligence (AI) application written in Common Lisp. The extraction of a representative subset of this complex application was aided by a Lisp Code Analyzer (LCA). The LCA enabled rapid analysis of the code, putting it in a concise and functionally readable form. An equivalent benchmark was created in Ada through manual translation of the Lisp version. A comparison of the execution results of both programs across a variety of compiler-machine combinations indicate that line-by-line translation coupled with analysis of the initial code can produce relatively efficient and reusable target code.
On 3-D inelastic analysis methods for hot section components. Volume 1: Special finite element models

NASA Technical Reports Server (NTRS)

Nakazawa, S.

1988-01-01

This annual status report presents the results of work performed during the fourth year of the 3-D Inelastic Analysis Methods for Hot Section Components program (NASA Contract NAS3-23697). The objective of the program is to produce a series of new computer codes permitting more accurate and efficient 3-D analysis of selected hot section components, i.e., combustor liners, turbine blades and turbine vanes. The computer codes embody a progression of math models and are streamlined to take advantage of geometrical features, loading conditions, and forms of material response that distinguish each group of selected components. Volume 1 of this report discusses the special finite element models developed during the fourth year of the contract.
Analyzing and modeling gravity and magnetic anomalies using the SPHERE program and Magsat data

NASA Technical Reports Server (NTRS)

Braile, L. W.; Hinze, W. J.; Vonfrese, R. R. B. (Principal Investigator)

1981-01-01

Computer codes were completed, tested, and documented for analyzing magnetic anomaly vector components by equivalent point dipole inversion. The codes are intended for use in inverting the magnetic anomaly due to a spherical prism in a horizontal geomagnetic field and for recomputing the anomaly in a vertical geomagnetic field. Modeling of potential fields at satellite elevations that are derived from three dimensional sources by program SPHERE was made significantly more efficient by improving the input routines. A preliminary model of the Andean subduction zone was used to compute the anomaly at satellite elevations using both actual geomagnetic parameters and vertical polarization. Program SPHERE is also being used to calculate satellite level magnetic and gravity anomalies from the Amazon River Aulacogen.
Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

NASA Astrophysics Data System (ADS)

Boyko, Oleksiy; Zheleznyak, Mark

2015-04-01

The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
CoreTSAR: Core Task-Size Adapting Runtime

DOE PAGES

Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry; ...

2014-10-27

Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Lastly, our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstratesmore » a robust ability to adapt to both a variety of workloads and underlying system configurations.« less
First- and Second-Order Sensitivity Analysis of a P-Version Finite Element Equation Via Automatic Differentiation

NASA Technical Reports Server (NTRS)

Hou, Gene

1998-01-01

Sensitivity analysis is a technique for determining derivatives of system responses with respect to design parameters. Among many methods available for sensitivity analysis, automatic differentiation has been proven through many applications in fluid dynamics and structural mechanics to be an accurate and easy method for obtaining derivatives. Nevertheless, the method can be computational expensive and can require a high memory space. This project will apply an automatic differentiation tool, ADIFOR, to a p-version finite element code to obtain first- and second- order then-nal derivatives, respectively. The focus of the study is on the implementation process and the performance of the ADIFOR-enhanced codes for sensitivity analysis in terms of memory requirement, computational efficiency, and accuracy.
Geometrical-optics code for computing the optical properties of large dielectric spheres.

PubMed

Zhou, Xiaobing; Li, Shusun; Stamnes, Knut

2003-07-20

Absorption of electromagnetic radiation by absorptive dielectric spheres such as snow grains in the near-infrared part of the solar spectrum cannot be neglected when radiative properties of snow are computed. Thus a new, to our knowledge, geometrical-optics code is developed to compute scattering and absorption cross sections of large dielectric particles of arbitrary complex refractive index. The number of internal reflections and transmissions are truncated on the basis of the ratio of the irradiance incident at the nth interface to the irradiance incident at the first interface for a specific optical ray. Thus the truncation number is a function of the angle of incidence. Phase functions for both near- and far-field absorption and scattering of electromagnetic radiation are calculated directly at any desired scattering angle by using a hybrid algorithm based on the bisection and Newton-Raphson methods. With these methods a large sphere's absorption and scattering properties of light can be calculated for any wavelength from the ultraviolet to the microwave regions. Assuming that large snow meltclusters (1-cm order), observed ubiquitously in the snow cover during summer, can be characterized as spheres, one may compute absorption and scattering efficiencies and the scattering phase function on the basis of this geometrical-optics method. A geometrical-optics method for sphere (GOMsphere) code is developed and tested against Wiscombe's Mie scattering code (MIE0) and a Monte Carlo code for a range of size parameters. GOMsphere can be combined with MIE0 to calculate the single-scattering properties of dielectric spheres of any size.
Metalevel programming in robotics: Some issues

NASA Technical Reports Server (NTRS)

Kumarn, A.; Parameswaran, N.

1987-01-01

Computing in robotics has two important requirements: efficiency and flexibility. Algorithms for robot actions are implemented usually in procedural languages such as VAL and AL. But, since their excessive bindings create inflexible structures of computation, it is proposed that Logic Programming is a more suitable language for robot programming due to its non-determinism, declarative nature, and provision for metalevel programming. Logic Programming, however, results in inefficient computations. As a solution to this problem, researchers discuss a framework in which controls can be described to improve efficiency. They have divided controls into: (1) in-code and (2) metalevel and discussed them with reference to selection of rules and dataflow. Researchers illustrated the merit of Logic Programming by modelling the motion of a robot from one point to another avoiding obstacles.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pastore, Giovanni; Rabiti, Cristian; Pizzocri, Davide

PolyPole is a numerical algorithm for the calculation of intra-granular fission gas release. In particular, the algorithm solves the gas diffusion problem in a fuel grain in time-varying conditions. The program has been extensively tested. PolyPole combines a high accuracy with a high computational efficiency and is ideally suited for application in fuel performance codes.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Methodology for sensitivity analysis, approximate analysis, and design optimization in CFD for multidisciplinary applications. [computational fluid dynamics

NASA Technical Reports Server (NTRS)

Taylor, Arthur C., III; Hou, Gene W.

1992-01-01

Fundamental equations of aerodynamic sensitivity analysis and approximate analysis for the two dimensional thin layer Navier-Stokes equations are reviewed, and special boundary condition considerations necessary to apply these equations to isolated lifting airfoils on 'C' and 'O' meshes are discussed in detail. An efficient strategy which is based on the finite element method and an elastic membrane representation of the computational domain is successfully tested, which circumvents the costly 'brute force' method of obtaining grid sensitivity derivatives, and is also useful in mesh regeneration. The issue of turbulence modeling is addressed in a preliminary study. Aerodynamic shape sensitivity derivatives are efficiently calculated, and their accuracy is validated on two viscous test problems, including: (1) internal flow through a double throat nozzle, and (2) external flow over a NACA 4-digit airfoil. An automated aerodynamic design optimization strategy is outlined which includes the use of a design optimization program, an aerodynamic flow analysis code, an aerodynamic sensitivity and approximate analysis code, and a mesh regeneration and grid sensitivity analysis code. Application of the optimization methodology to the two test problems in each case resulted in a new design having a significantly improved performance in the aerodynamic response of interest.
Do humans make good decisions?

PubMed Central

Summerfield, Christopher; Tsetsos, Konstantinos

2014-01-01

Human performance on perceptual classification tasks approaches that of an ideal observer, but economic decisions are often inconsistent and intransitive, with preferences reversing according to the local context. We discuss the view that suboptimal choices may result from the efficient coding of decision-relevant information, a strategy that allows expected inputs to be processed with higher gain than unexpected inputs. Efficient coding leads to ‘robust’ decisions that depart from optimality but maximise the information transmitted by a limited-capacity system in a rapidly-changing world. We review recent work showing that when perceptual environments are variable or volatile, perceptual decisions exhibit the same suboptimal context-dependence as economic choices, and propose a general computational framework that accounts for findings across the two domains. PMID:25488076
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Identification of Computational and Experimental Reduced-Order Models

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Hong, Moeljo S.; Bartels, Robert E.; Piatak, David J.; Scott, Robert C.

2003-01-01

The identification of computational and experimental reduced-order models (ROMs) for the analysis of unsteady aerodynamic responses and for efficient aeroelastic analyses is presented. For the identification of a computational aeroelastic ROM, the CFL3Dv6.0 computational fluid dynamics (CFD) code is used. Flutter results for the AGARD 445.6 Wing and for a Rigid Semispan Model (RSM) computed using CFL3Dv6.0 are presented, including discussion of associated computational costs. Modal impulse responses of the unsteady aerodynamic system are computed using the CFL3Dv6.0 code and transformed into state-space form. The unsteady aerodynamic state-space ROM is then combined with a state-space model of the structure to create an aeroelastic simulation using the MATLAB/SIMULINK environment. The MATLAB/SIMULINK ROM is then used to rapidly compute aeroelastic transients, including flutter. The ROM shows excellent agreement with the aeroelastic analyses computed using the CFL3Dv6.0 code directly. For the identification of experimental unsteady pressure ROMs, results are presented for two configurations: the RSM and a Benchmark Supercritical Wing (BSCW). Both models were used to acquire unsteady pressure data due to pitching oscillations on the Oscillating Turntable (OTT) system at the Transonic Dynamics Tunnel (TDT). A deconvolution scheme involving a step input in pitch and the resultant step response in pressure, for several pressure transducers, is used to identify the unsteady pressure impulse responses. The identified impulse responses are then used to predict the pressure responses due to pitching oscillations at several frequencies. Comparisons with the experimental data are then presented.
Parallel Computation of the Regional Ocean Modeling System (ROMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, P; Song, Y T; Chao, Y

2005-04-05

The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Probabilistic Structural Analysis Theory Development

NASA Technical Reports Server (NTRS)

Burnside, O. H.

1985-01-01

The objective of the Probabilistic Structural Analysis Methods (PSAM) project is to develop analysis techniques and computer programs for predicting the probabilistic response of critical structural components for current and future space propulsion systems. This technology will play a central role in establishing system performance and durability. The first year's technical activity is concentrating on probabilistic finite element formulation strategy and code development. Work is also in progress to survey critical materials and space shuttle mian engine components. The probabilistic finite element computer program NESSUS (Numerical Evaluation of Stochastic Structures Under Stress) is being developed. The final probabilistic code will have, in the general case, the capability of performing nonlinear dynamic of stochastic structures. It is the goal of the approximate methods effort to increase problem solving efficiency relative to finite element methods by using energy methods to generate trial solutions which satisfy the structural boundary conditions. These approximate methods will be less computer intensive relative to the finite element approach.
Code Modernization of VPIC

NASA Astrophysics Data System (ADS)

Bird, Robert; Nystrom, David; Albright, Brian

2017-10-01

The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
MCNP and GADRAS Comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klasky, Marc Louis; Myers, Steven Charles; James, Michael R.

To facilitate the timely execution of System Threat Reviews (STRs) for DNDO, and also to develop a methodology for performing STRs, LANL performed comparisons of several radiation transport codes (MCNP, GADRAS, and Gamma-Designer) that have been previously utilized to compute radiation signatures. While each of these codes has strengths, it is of paramount interest to determine the limitations of each of the respective codes and also to identify the most time efficient means by which to produce computational results, given the large number of parametric cases that are anticipated in performing STR's. These comparisons serve to identify regions of applicabilitymore » for each code and provide estimates of uncertainty that may be anticipated. Furthermore, while performing these comparisons, examination of the sensitivity of the results to modeling assumptions was also examined. These investigations serve to enable the creation of the LANL methodology for performing STRs. Given the wide variety of radiation test sources, scenarios, and detectors, LANL calculated comparisons of the following parameters: decay data, multiplicity, device (n,γ) leakages, and radiation transport through representative scenes and shielding. This investigation was performed to understand potential limitations utilizing specific codes for different aspects of the STR challenges.« less
Adaptive Wavelet Coding Applied in a Wireless Control System.

PubMed

Gama, Felipe O S; Silveira, Luiz F Q; Salazar, Andrés O

2017-12-13

Wireless control systems can sense, control and act on the information exchanged between the wireless sensor nodes in a control loop. However, the exchanged information becomes susceptible to the degenerative effects produced by the multipath propagation. In order to minimize the destructive effects characteristic of wireless channels, several techniques have been investigated recently. Among them, wavelet coding is a good alternative for wireless communications for its robustness to the effects of multipath and its low computational complexity. This work proposes an adaptive wavelet coding whose parameters of code rate and signal constellation can vary according to the fading level and evaluates the use of this transmission system in a control loop implemented by wireless sensor nodes. The performance of the adaptive system was evaluated in terms of bit error rate (BER) versus E b / N 0 and spectral efficiency, considering a time-varying channel with flat Rayleigh fading, and in terms of processing overhead on a control system with wireless communication. The results obtained through computational simulations and experimental tests show performance gains obtained by insertion of the adaptive wavelet coding in a control loop with nodes interconnected by wireless link. These results enable the use of this technique in a wireless link control loop.
PROTEUS-SN User Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shemon, Emily R.; Smith, Micheal A.; Lee, Changho

2016-02-16

PROTEUS-SN is a three-dimensional, highly scalable, high-fidelity neutron transport code developed at Argonne National Laboratory. The code is applicable to all spectrum reactor transport calculations, particularly those in which a high degree of fidelity is needed either to represent spatial detail or to resolve solution gradients. PROTEUS-SN solves the second order formulation of the transport equation using the continuous Galerkin finite element method in space, the discrete ordinates approximation in angle, and the multigroup approximation in energy. PROTEUS-SN’s parallel methodology permits the efficient decomposition of the problem by both space and angle, permitting large problems to run efficiently on hundredsmore » of thousands of cores. PROTEUS-SN can also be used in serial or on smaller compute clusters (10’s to 100’s of cores) for smaller homogenized problems, although it is generally more computationally expensive than traditional homogenized methodology codes. PROTEUS-SN has been used to model partially homogenized systems, where regions of interest are represented explicitly and other regions are homogenized to reduce the problem size and required computational resources. PROTEUS-SN solves forward and adjoint eigenvalue problems and permits both neutron upscattering and downscattering. An adiabatic kinetics option has recently been included for performing simple time-dependent calculations in addition to standard steady state calculations. PROTEUS-SN handles void and reflective boundary conditions. Multigroup cross sections can be generated externally using the MC2-3 fast reactor multigroup cross section generation code or internally using the cross section application programming interface (API) which can treat the subgroup or resonance table libraries. PROTEUS-SN is written in Fortran 90 and also includes C preprocessor definitions. The code links against the PETSc, METIS, HDF5, and MPICH libraries. It optionally links against the MOAB library and is a part of the SHARP multi-physics suite for coupled multi-physics analysis of nuclear reactors. This user manual describes how to set up a neutron transport simulation with the PROTEUS-SN code. A companion methodology manual describes the theory and algorithms within PROTEUS-SN.« less
Optimization of monitoring networks based on uncertainty quantification of model predictions of contaminant transport

NASA Astrophysics Data System (ADS)

Vesselinov, V. V.; Harp, D.

2010-12-01

The process of decision making to protect groundwater resources requires a detailed estimation of uncertainties in model predictions. Various uncertainties associated with modeling a natural system, such as: (1) measurement and computational errors; (2) uncertainties in the conceptual model and model-parameter estimates; (3) simplifications in model setup and numerical representation of governing processes, contribute to the uncertainties in the model predictions. Due to this combination of factors, the sources of predictive uncertainties are generally difficult to quantify individually. Decision support related to optimal design of monitoring networks requires (1) detailed analyses of existing uncertainties related to model predictions of groundwater flow and contaminant transport, (2) optimization of the proposed monitoring network locations in terms of their efficiency to detect contaminants and provide early warning. We apply existing and newly-proposed methods to quantify predictive uncertainties and to optimize well locations. An important aspect of the analysis is the application of newly-developed optimization technique based on coupling of Particle Swarm and Levenberg-Marquardt optimization methods which proved to be robust and computationally efficient. These techniques and algorithms are bundled in a software package called MADS. MADS (Model Analyses for Decision Support) is an object-oriented code that is capable of performing various types of model analyses and supporting model-based decision making. The code can be executed under different computational modes, which include (1) sensitivity analyses (global and local), (2) Monte Carlo analysis, (3) model calibration, (4) parameter estimation, (5) uncertainty quantification, and (6) model selection. The code can be externally coupled with any existing model simulator through integrated modules that read/write input and output files using a set of template and instruction files (consistent with the PEST I/O protocol). MADS can also be internally coupled with a series of built-in analytical simulators. MADS provides functionality to work directly with existing control files developed for the code PEST (Doherty 2009). To perform the computational modes mentioned above, the code utilizes (1) advanced Latin-Hypercube sampling techniques (including Improved Distributed Sampling), (2) various gradient-based Levenberg-Marquardt optimization methods, (3) advanced global optimization methods (including Particle Swarm Optimization), and (4) a selection of alternative objective functions. The code has been successfully applied to perform various model analyses related to environmental management of real contamination sites. Examples include source identification problems, quantification of uncertainty, model calibration, and optimization of monitoring networks. The methodology and software codes are demonstrated using synthetic and real case studies where monitoring networks are optimized taking into account the uncertainty in model predictions of contaminant transport.
Applications of Derandomization Theory in Coding

NASA Astrophysics Data System (ADS)

Cheraghchi, Mahdi

2011-07-01

Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes. [This is a shortened version of the actual abstract in the thesis.
Reed Solomon codes for error control in byte organized computer memory systems

NASA Technical Reports Server (NTRS)

Lin, S.; Costello, D. J., Jr.

1984-01-01

A problem in designing semiconductor memories is to provide some measure of error control without requiring excessive coding overhead or decoding time. In LSI and VLSI technology, memories are often organized on a multiple bit (or byte) per chip basis. For example, some 256K-bit DRAM's are organized in 32Kx8 bit-bytes. Byte oriented codes such as Reed Solomon (RS) codes can provide efficient low overhead error control for such memories. However, the standard iterative algorithm for decoding RS codes is too slow for these applications. Some special decoding techniques for extended single-and-double-error-correcting RS codes which are capable of high speed operation are presented. These techniques are designed to find the error locations and the error values directly from the syndrome without having to use the iterative algorithm to find the error locator polynomial.
Application of a Two-dimensional Unsteady Viscous Analysis Code to a Supersonic Throughflow Fan Stage

NASA Technical Reports Server (NTRS)

Steinke, Ronald J.

1989-01-01

The Rai ROTOR1 code for two-dimensional, unsteady viscous flow analysis was applied to a supersonic throughflow fan stage design. The axial Mach number for this fan design increases from 2.0 at the inlet to 2.9 at the outlet. The Rai code uses overlapped O- and H-grids that are appropriately packed. The Rai code was run on a Cray XMP computer; then data postprocessing and graphics were performed to obtain detailed insight into the stage flow. The large rotor wakes uniformly traversed the rotor-stator interface and dispersed as they passed through the stator passage. Only weak blade shock losses were computerd, which supports the design goals. High viscous effects caused large blade wakes and a low fan efficiency. Rai code flow predictions were essentially steady for the rotor, and they compared well with Chima rotor viscous code predictions based on a C-grid of similar density.
Global MHD simulation of magnetosphere using HPF

NASA Astrophysics Data System (ADS)

Ogino, T.

We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
FY17 Status Report on NEAMS Neutronics Activities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, C. H.; Jung, Y. S.; Smith, M. A.

2017-09-30

Under the U.S. DOE NEAMS program, the high-fidelity neutronics code system has been developed to support the multiphysics modeling and simulation capability named SHARP. The neutronics code system includes the high-fidelity neutronics code PROTEUS, the cross section library and preprocessing tools, the multigroup cross section generation code MC2-3, the in-house meshing generation tool, the perturbation and sensitivity analysis code PERSENT, and post-processing tools. The main objectives of the NEAMS neutronics activities in FY17 are to continue development of an advanced nodal solver in PROTEUS for use in nuclear reactor design and analysis projects, implement a simplified sub-channel based thermal-hydraulic (T/H)more » capability into PROTEUS to efficiently compute the thermal feedback, improve the performance of PROTEUS-MOCEX using numerical acceleration and code optimization, improve the cross section generation tools including MC2-3, and continue to perform verification and validation tests for PROTEUS.« less
Ionisation Feedback in Star and Cluster Formation Simulations

NASA Astrophysics Data System (ADS)

Ercolano, Barbara; Gritschneder, Matthias

2011-04-01

Feedback from photoionisation may dominate on parsec scales in massive star-forming regions. Such feedback may inhibit or enhance the star formation efficiency and sustain or even drive turbulence in the parent molecular cloud. Photoionisation feedback may also provide a mechanism for the rapid expulsion of gas from young clusters' potentials, often invoked as the main cause of `infant mortality'. There is currently no agreement, however, with regards to the efficiency of this process and how environment may affect the direction (positive or negative) in which it proceeds. The study of the photoionisation process as part of hydrodynamical simulations is key to understanding these issues, however, due to the computational demand of the problem, crude approximations for the radiation transfer are often employed. We will briefly review some of the most commonly used approximations and discuss their major drawbacks. We will then present the results of detailed tests carried out using the detailed photoionisation code mocassin and the SPH+ionisation code iVINE code, aimed at understanding the error introduced by the simplified photoionisation algorithms. This is particularly relevant as a number of new codes have recently been developed along those lines. We will finally propose a new approach that should allow to efficiently and self-consistently treat the photoionisation problem for complex radiation and density fields.
Algorithm and code development for unsteady three-dimensional Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Obayashi, Shigeru

1994-01-01

Aeroelastic tests require extensive cost and risk. An aeroelastic wind-tunnel experiment is an order of magnitude more expensive than a parallel experiment involving only aerodynamics. By complementing the wind-tunnel experiments with numerical simulations, the overall cost of the development of aircraft can be considerably reduced. In order to accurately compute aeroelastic phenomenon it is necessary to solve the unsteady Euler/Navier-Stokes equations simultaneously with the structural equations of motion. These equations accurately describe the flow phenomena for aeroelastic applications. At ARC a code, ENSAERO, is being developed for computing the unsteady aerodynamics and aeroelasticity of aircraft, and it solves the Euler/Navier-Stokes equations. The purpose of this cooperative agreement was to enhance ENSAERO in both algorithm and geometric capabilities. During the last five years, the algorithms of the code have been enhanced extensively by using high-resolution upwind algorithms and efficient implicit solvers. The zonal capability of the code has been extended from a one-to-one grid interface to a mismatching unsteady zonal interface. The geometric capability of the code has been extended from a single oscillating wing case to a full-span wing-body configuration with oscillating control surfaces. Each time a new capability was added, a proper validation case was simulated, and the capability of the code was demonstrated.

Efficient Skeletonization of Volumetric Objects.

PubMed

Zhou, Yong; Toga, Arthur W

1999-07-01

Skeletonization promises to become a powerful tool for compact shape description, path planning, and other applications. However, current techniques can seldom efficiently process real, complicated 3D data sets, such as MRI and CT data of human organs. In this paper, we present an efficient voxel-coding based algorithm for Skeletonization of 3D voxelized objects. The skeletons are interpreted as connected centerlines. consisting of sequences of medial points of consecutive clusters. These centerlines are initially extracted as paths of voxels, followed by medial point replacement, refinement, smoothness, and connection operations. The voxel-coding techniques have been proposed for each of these operations in a uniform and systematic fashion. In addition to preserving basic connectivity and centeredness, the algorithm is characterized by straightforward computation, no sensitivity to object boundary complexity, explicit extraction of ready-to-parameterize and branch-controlled skeletons, and efficient object hole detection. These issues are rarely discussed in traditional methods. A range of 3D medical MRI and CT data sets were used for testing the algorithm, demonstrating its utility.
A combined experimental-modelling method for the detection and analysis of pollution in coastal zones

NASA Astrophysics Data System (ADS)

Limić, Nedzad; Valković, Vladivoj

1996-04-01

Pollution of coastal seas with toxic substances can be efficiently detected by examining toxic materials in sediment samples. These samples contain information on the overall pollution from surrounding sources such as yacht anchorages, nearby industries, sewage systems, etc. In an efficient analysis of pollution one must determine the contribution from each individual source. In this work it is demonstrated that a modelling method can be utilized for solving this latter problem. The modelling method is based on a unique interpretation of concentrations in sediments from all sampling stations. The proposed method is a synthesis consisting of the utilization of PIXE as an efficient method of pollution concentration determination and the code ANCOPOL (N. Limic and R. Benis, The computer code ANCOPOL, SimTel/msdos/geology, 1994 [1]) for the calculation of contributions from the main polluters. The efficiency and limits of the proposed method are demonstrated by discussing trace element concentrations in sediments of Punat Bay on the island of Krk in Croatia.
Study of solid-conversion gaseous detector based on GEM for high energy X-ray industrial CT.

PubMed

Zhou, Rifeng; Zhou, Yaling

2014-01-01

The general gaseous ionization detectors are not suitable for high energy X-ray industrial computed tomography (HEICT) because of their inherent limitations, especially low detective efficiency and large volume. The goal of this study was to investigate a new type of gaseous detector to solve these problems. The novel detector was made by a metal foil as X-ray convertor to improve the conversion efficiency, and the Gas Electron Multiplier (hereinafter "GEM") was used as electron amplifier to lessen its volume. The detective mechanism and signal formation of the detector was discussed in detail. The conversion efficiency was calculated by using EGSnrc Monte Carlo code, and the transport course of photon and secondary electron avalanche in the detector was simulated with the Maxwell and Garfield codes. The result indicated that this detector has higher conversion efficiency as well as less volume. Theoretically this kind of detector could be a perfect candidate for replacing the conventional detector in HEICT.
Standardisation of the (129)I, (151)Sm and (166m)Ho activity concentration using the CIEMAT/NIST efficiency tracing method.

PubMed

Altzitzoglou, Timotheos; Rožkov, Andrej

2016-03-01

The (129)I, (151)Sm and (166m)Ho standardisations using the CIEMAT/NIST efficiency tracing method, that have been carried out in the frame of the European Metrology Research Program project "Metrology for Radioactive Waste Management" are described. The radionuclide beta counting efficiencies were calculated using two computer codes CN2005 and MICELLE2. The sensitivity analysis of the code input parameters (ionization quenching factor, beta shape factor) on the calculated efficiencies was performed, and the results are discussed. The combined relative standard uncertainty of the standardisations of the (129)I, (151)Sm and (166m)Ho solutions were 0.4%, 0.5% and 0.4%, respectively. The stated precision obtained using the CIEMAT/NIST method is better than that previously reported in the literature obtained by the TDCR ((129)I), the 4πγ-NaI ((166m)Ho) counting or the CIEMAT/NIST method ((151)Sm). Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.

PubMed

Kunkel, Susanne; Schenck, Wolfram

2017-01-01

NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code

PubMed Central

Kunkel, Susanne; Schenck, Wolfram

2017-01-01

NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
A Fast Code for Jupiter Atmospheric Entry

NASA Technical Reports Server (NTRS)

Tauber, Michael E.; Wercinski, Paul; Yang, Lily; Chen, Yih-Kanq; Arnold, James (Technical Monitor)

1998-01-01

A fast code was developed to calculate the forebody heating environment and heat shielding that is required for Jupiter atmospheric entry probes. A carbon phenolic heat shield material was assumed and, since computational efficiency was a major goal, analytic expressions were used, primarily, to calculate the heating, ablation and the required insulation. The code was verified by comparison with flight measurements from the Galileo probe's entry; the calculation required 3.5 sec of CPU time on a work station. The computed surface recessions from ablation were compared with the flight values at six body stations. The average, absolute, predicted difference in the recession was 12.5% too high. The forebody's mass loss was overpredicted by 5.5% and the heat shield mass was calculated to be 15% less than the probe's actual heat shield. However, the calculated heat shield mass did not include contingencies for the various uncertainties that must be considered in the design of probes. Therefore, the agreement with the Galileo probe's values was considered satisfactory, especially in view of the code's fast running time and the methods' approximations.
2-Step scalar deadzone quantization for bitplane image coding.

PubMed

Auli-Llinas, Francesc

2013-12-01

Modern lossy image coding systems generate a quality progressive codestream that, truncated at increasing rates, produces an image with decreasing distortion. Quality progressivity is commonly provided by an embedded quantizer that employs uniform scalar deadzone quantization (USDQ) together with a bitplane coding strategy. This paper introduces a 2-step scalar deadzone quantization (2SDQ) scheme that achieves same coding performance as that of USDQ while reducing the coding passes and the emitted symbols of the bitplane coding engine. This serves to reduce the computational costs of the codec and/or to code high dynamic range images. The main insights behind 2SDQ are the use of two quantization step sizes that approximate wavelet coefficients with more or less precision depending on their density, and a rate-distortion optimization technique that adjusts the distortion decreases produced when coding 2SDQ indexes. The integration of 2SDQ in current codecs is straightforward. The applicability and efficiency of 2SDQ are demonstrated within the framework of JPEG2000.
Fast, Statistical Model of Surface Roughness for Ion-Solid Interaction Simulations and Efficient Code Coupling

NASA Astrophysics Data System (ADS)

Drobny, Jon; Curreli, Davide; Ruzic, David; Lasa, Ane; Green, David; Canik, John; Younkin, Tim; Blondel, Sophie; Wirth, Brian

2017-10-01

Surface roughness greatly impacts material erosion, and thus plays an important role in Plasma-Surface Interactions. Developing strategies for efficiently introducing rough surfaces into ion-solid interaction codes will be an important step towards whole-device modeling of plasma devices and future fusion reactors such as ITER. Fractal TRIDYN (F-TRIDYN) is an upgraded version of the Monte Carlo, BCA program TRIDYN developed for this purpose that includes an explicit fractal model of surface roughness and extended input and output options for file-based code coupling. Code coupling with both plasma and material codes has been achieved and allows for multi-scale, whole-device modeling of plasma experiments. These code coupling results will be presented. F-TRIDYN has been further upgraded with an alternative, statistical model of surface roughness. The statistical model is significantly faster than and compares favorably to the fractal model. Additionally, the statistical model compares well to alternative computational surface roughness models and experiments. Theoretical links between the fractal and statistical models are made, and further connections to experimental measurements of surface roughness are explored. This work was supported by the PSI-SciDAC Project funded by the U.S. Department of Energy through contract DOE-DE-SC0008658.
Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

NASA Astrophysics Data System (ADS)

Sandalski, Stou

Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.
New schemes for internally contracted multi-reference configuration interaction

NASA Astrophysics Data System (ADS)

Wang, Yubin; Han, Huixian; Lei, Yibo; Suo, Bingbing; Zhu, Haiyan; Song, Qi; Wen, Zhenyi

2014-10-01

In this work we present a new internally contracted multi-reference configuration interaction (MRCI) scheme by applying the graphical unitary group approach and the hole-particle symmetry. The latter allows a Distinct Row Table (DRT) to split into a number of sub-DRTs in the active space. In the new scheme a contraction is defined as a linear combination of arcs within a sub-DRT, and connected to the head and tail of the DRT through up-steps and down-steps to generate internally contracted configuration functions. The new scheme deals with the closed-shell (hole) orbitals and external orbitals in the same manner and thus greatly simplifies calculations of coupling coefficients and CI matrix elements. As a result, the number of internal orbitals is no longer a bottleneck of MRCI calculations. The validity and efficiency of the new ic-MRCI code are tested by comparing with the corresponding WK code of the MOLPRO package. The energies obtained from the two codes are essentially identical, and the computational efficiencies of the two codes have their own advantages.
Algorithms for Zonal Methods and Development of Three Dimensional Mesh Generation Procedures.

DTIC Science & Technology

1984-02-01

a r-re complete set of equations is used, but their effect is imposed by means of a right hand side forcing function, not by means of a zonal boundary...modifications of flow-simulation algorithms The explicit finite-difference code of Magnus and are discussed. Computational tests in two dimensions...used to simplify the task of grid generation without an adverse achieve computational efficiency. More recently, effect on flow-field algorithms and
Computer simulation of plasma and N-body problems

NASA Technical Reports Server (NTRS)

Harries, W. L.; Miller, J. B.

1975-01-01

The following FORTRAN language computer codes are presented: (1) efficient two- and three-dimensional central force potential solvers; (2) a three-dimensional simulator of an isolated galaxy which incorporates the potential solver; (3) a two-dimensional particle-in-cell simulator of the Jeans instability in an infinite self-gravitating compressible gas; and (4) a two-dimensional particle-in-cell simulator of a rotating self-gravitating compressible gaseous system of which rectangular coordinate and superior polar coordinate versions were written.
Performance of an Optimized Eta Model Code on the Cray T3E and a Network of PCs

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Rancic, Miodrag; Geiger, Jim

2000-01-01

In the year 2001, NASA will launch the satellite TRIANA that will be the first Earth observing mission to provide a continuous, full disk view of the sunlit Earth. As a part of the HPCC Program at NASA GSFC, we have started a project whose objectives are to develop and implement a 3D cloud data assimilation system, by combining TRIANA measurements with model simulation, and to produce accurate statistics of global cloud coverage as an important element of the Earth's climate. For simulation of the atmosphere within this project we are using the NCEP/NOAA operational Eta model. In order to compare TRIANA and the Eta model data on approximately the same grid without significant downscaling, the Eta model will be integrated at a resolution of about 15 km. The integration domain (from -70 to +70 deg in latitude and 150 deg in longitude) will cover most of the sunlit Earth disc and will continuously rotate around the globe following TRIANA. The cloud data assimilation is supposed to run and produce 3D clouds on a near real-time basis. Such a numerical setup and integration design is very ambitious and computationally demanding. Thus, though the Eta model code has been very carefully developed and its computational efficiency has been systematically polished during the years of operational implementation at NCEP, the current MPI version may still have problems with memory and efficiency for the TRIANA simulations. Within this work, we optimize a parallel version of the Eta model code on a Cray T3E and a network of PCs (theHIVE) in order to improve its overall efficiency. Our optimization procedure consists of introducing dynamically allocated arrays to reduce the size of static memory, and optimizing on a single processor by splitting loops to limit the number of streams. All the presented results are derived using an integration domain centered at the equator, with a size of 60 x 60 deg, and with horizontal resolutions of 1/2 and 1/3 deg, respectively. In accompanying charts we report the elapsed time, the speedup and the Mflops as a function of the number of processors for the non-optimized version of the code on the T3E and theHIVE. The large amount of communication required for model integration explains its poor performance on theHIVE. Our initial implementation of the dynamic memory allocation has contributed to about 12% reduction of memory but has introduced a 3% overhead in computing time. This overhead was removed by performing loop splitting in some of the high demanding subroutines. When the Eta code is fully optimized in order to meet the memory requirement for TRIANA simulations, a non-negligeable overhead may appear that may seriously affect the efficiency of the code. To alleviate this problem, we are considering implementation of a new algorithm for the horizontal advection that is computationally less expensive, and also a new approach for marching in time.
Noniterative MAP reconstruction using sparse matrix representations.

PubMed

Cao, Guangzhi; Bouman, Charles A; Webb, Kevin J

2009-09-01

We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations. Our approach is to precompute and store the inverse matrix required for MAP reconstruction. This approach has generally not been used in the past because the inverse matrix is typically large and fully populated (i.e., not sparse). In order to overcome this problem, we introduce two new ideas. The first idea is a novel theory for the lossy source coding of matrix transformations which we refer to as matrix source coding. This theory is based on a distortion metric that reflects the distortions produced in the final matrix-vector product, rather than the distortions in the coded matrix itself. The resulting algorithms are shown to require orthonormal transformations of both the measurement data and the matrix rows and columns before quantization and coding. The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT). The SMT is a generalization of the classical FFT in that it uses butterflies to compute an orthonormal transform; but unlike an FFT, the SMT uses the butterflies in an irregular pattern, and is numerically designed to best approximate the desired transforms. We demonstrate the potential of the noniterative MAP reconstruction with examples from optical tomography. The method requires offline computation to encode the inverse transform. However, once these offline computations are completed, the noniterative MAP algorithm is shown to reduce both storage and computation by well over two orders of magnitude, as compared to a linear iterative reconstruction methods.
Heterogeneous Distributed Computing for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy S.

1998-01-01

The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Overall Traveling-Wave-Tube Efficiency Improved By Optimized Multistage Depressed Collector Design

NASA Technical Reports Server (NTRS)

Vaden, Karl R.

2002-01-01

Depressed Collector Design The microwave traveling wave tube (TWT) is used widely for space communications and high-power airborne transmitting sources. One of the most important features in designing a TWT is overall efficiency. Yet, overall TWT efficiency is strongly dependent on the efficiency of the electron beam collector, particularly for high values of collector efficiency. For these reasons, the NASA Glenn Research Center developed an optimization algorithm based on simulated annealing to quickly design highly efficient multistage depressed collectors (MDC's). Simulated annealing is a strategy for solving highly nonlinear combinatorial optimization problems. Its major advantage over other methods is its ability to avoid becoming trapped in local minima. Simulated annealing is based on an analogy to statistical thermodynamics, specifically the physical process of annealing: heating a material to a temperature that permits many atomic rearrangements and then cooling it carefully and slowly, until it freezes into a strong, minimum-energy crystalline structure. This minimum energy crystal corresponds to the optimal solution of a mathematical optimization problem. The TWT used as a baseline for optimization was the 32-GHz, 10-W, helical TWT developed for the Cassini mission to Saturn. The method of collector analysis and design used was a 2-1/2-dimensional computational procedure that employs two types of codes, a large signal analysis code and an electron trajectory code. The large signal analysis code produces the spatial, energetic, and temporal distributions of the spent beam entering the MDC. An electron trajectory code uses the resultant data to perform the actual collector analysis. The MDC was optimized for maximum MDC efficiency and minimum final kinetic energy of all collected electrons (to reduce heat transfer). The preceding figure shows the geometric and electrical configuration of an optimized collector with an efficiency of 93.8 percent. The results show the improvement in collector efficiency from 89.7 to 93.8 percent, resulting in an increase of three overall efficiency points. In addition, the time to design a highly efficient MDC was reduced from a month to a few days. All work was done in-house at Glenn for the High Rate Data Delivery Program. Future plans include optimizing the MDC and TWT interaction circuit in tandem to further improve overall TWT efficiency.
ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers

NASA Astrophysics Data System (ADS)

Torrent, Marc

2014-03-01

For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization algorithm, as well as the use of external optimized librairies. Part of this work has been supported by the european Prace project (PaRtnership for Advanced Computing in Europe) in the framework of its workpackage 8.
TOUGH3: A new efficient version of the TOUGH suite of multiphase flow and transport simulators

NASA Astrophysics Data System (ADS)

Jung, Yoojin; Pau, George Shu Heng; Finsterle, Stefan; Pollyea, Ryan M.

2017-11-01

The TOUGH suite of nonisothermal multiphase flow and transport simulators has been updated by various developers over many years to address a vast range of challenging subsurface problems. The increasing complexity of the simulated processes as well as the growing size of model domains that need to be handled call for an improvement in the simulator's computational robustness and efficiency. Moreover, modifications have been frequently introduced independently, resulting in multiple versions of TOUGH that (1) led to inconsistencies in feature implementation and usage, (2) made code maintenance and development inefficient, and (3) caused confusion to users and developers. TOUGH3-a new base version of TOUGH-addresses these issues. It consolidates both the serial (TOUGH2 V2.1) and parallel (TOUGH2-MP V2.0) implementations, enabling simulations to be performed on desktop computers and supercomputers using a single code. New PETSc parallel linear solvers are added to the existing serial solvers of TOUGH2 and the Aztec solver used in TOUGH2-MP. The PETSc solvers generally perform better than the Aztec solvers in parallel and the internal TOUGH3 linear solver in serial. TOUGH3 also incorporates many new features, addresses bugs, and improves the flexibility of data handling. Due to the improved capabilities and usability, TOUGH3 is more robust and efficient for solving tough and computationally demanding problems in diverse scientific and practical applications related to subsurface flow modeling.
Video coding for 3D-HEVC based on saliency information

NASA Astrophysics Data System (ADS)

Yu, Fang; An, Ping; Yang, Chao; You, Zhixiang; Shen, Liquan

2016-11-01

As an extension of High Efficiency Video Coding ( HEVC), 3D-HEVC has been widely researched under the impetus of the new generation coding standard in recent years. Compared with H.264/AVC, its compression efficiency is doubled while keeping the same video quality. However, its higher encoding complexity and longer encoding time are not negligible. To reduce the computational complexity and guarantee the subjective quality of virtual views, this paper presents a novel video coding method for 3D-HEVC based on the saliency informat ion which is an important part of Human Visual System (HVS). First of all, the relationship between the current coding unit and its adjacent units is used to adjust the maximum depth of each largest coding unit (LCU) and determine the SKIP mode reasonably. Then, according to the saliency informat ion of each frame image, the texture and its corresponding depth map will be divided into three regions, that is, salient area, middle area and non-salient area. Afterwards, d ifferent quantization parameters will be assigned to different regions to conduct low complexity coding. Finally, the compressed video will generate new view point videos through the renderer tool. As shown in our experiments, the proposed method saves more bit rate than other approaches and achieves up to highest 38% encoding time reduction without subjective quality loss in compression or rendering.

LSENS, a general chemical kinetics and sensitivity analysis code for homogeneous gas-phase reactions. 2: Code description and usage

NASA Technical Reports Server (NTRS)

Radhakrishnan, Krishnan; Bittker, David A.

1994-01-01

LSENS, the Lewis General Chemical Kinetics Analysis Code, has been developed for solving complex, homogeneous, gas-phase chemical kinetics problems and contains sensitivity analysis for a variety of problems, including nonisothermal situations. This report is part 2 of a series of three reference publications that describe LSENS, provide a detailed guide to its usage, and present many example problems. Part 2 describes the code, how to modify it, and its usage, including preparation of the problem data file required to execute LSENS. Code usage is illustrated by several example problems, which further explain preparation of the problem data file and show how to obtain desired accuracy in the computed results. LSENS is a flexible, convenient, accurate, and efficient solver for chemical reaction problems such as static system; steady, one-dimensional, inviscid flow; reaction behind incident shock wave, including boundary layer correction; and perfectly stirred (highly backmixed) reactor. In addition, the chemical equilibrium state can be computed for the following assigned states: temperature and pressure, enthalpy and pressure, temperature and volume, and internal energy and volume. For static problems the code computes the sensitivity coefficients of the dependent variables and their temporal derivatives with respect to the initial values of the dependent variables and/or the three rate coefficient parameters of the chemical reactions. Part 1 (NASA RP-1328) derives the governing equations describes the numerical solution procedures for the types of problems that can be solved by lSENS. Part 3 (NASA RP-1330) explains the kinetics and kinetics-plus-sensitivity-analysis problems supplied with LSENS and presents sample results.
Computer Program for the Design and Off-Design Performance of Turbojet and Turbofan Engine Cycles

NASA Technical Reports Server (NTRS)

Morris, S. J.

1978-01-01

The rapid computer program is designed to be run in a stand-alone mode or operated within a larger program. The computation is based on a simplified one-dimensional gas turbine cycle. Each component in the engine is modeled thermo-dynamically. The component efficiencies used in the thermodynamic modeling are scaled for the off-design conditions from input design point values using empirical trends which are included in the computer code. The engine cycle program is capable of producing reasonable engine performance prediction with a minimum of computer execute time. The current computer execute time on the IBM 360/67 for one Mach number, one altitude, and one power setting is about 0.1 seconds. about 0.1 seconds. The principal assumption used in the calculation is that the compressor is operated along a line of maximum adiabatic efficiency on the compressor map. The fluid properties are computed for the combustion mixture, but dissociation is not included. The procedure included in the program is only for the combustion of JP-4, methane, or hydrogen.
A comparison between implicit and hybrid methods for the calculation of steady and unsteady inlet flows

NASA Technical Reports Server (NTRS)

Coakley, T. J.; Hsieh, T.

1985-01-01

Numerical simulation of steady and unsteady transonic diffuser flows using two different computer codes are discussed and compared with experimental data. The codes solve the Reynolds-averaged, compressible, Navier-Stokes equations using various turbulence models. One of the codes has been applied extensively to diffuser flows and uses the hybrid method of MacCormack. This code is relatively inefficient numerically. The second code, which was developed more recently, is fully implicit and is relatively efficient numerically. Simulations of steady flows using the implicit code are shown to be in good agreement with simulations using the hybrid code. Both simulations are in good agreement with experimental results. Simulations of unsteady flows using the two codes are in good qualitative agreement with each other, although the quantitative agreement is not as good as in the steady flow cases. The implicit code is shown to be eight times faster than the hybrid code for unsteady flow calculations and up to 32 times faster for steady flow calculations. Results of calculations using alternative turbulence models are also discussed.
Multifidelity-CMA: a multifidelity approach for efficient personalisation of 3D cardiac electromechanical models.

PubMed

Molléro, Roch; Pennec, Xavier; Delingette, Hervé; Garny, Alan; Ayache, Nicholas; Sermesant, Maxime

2018-02-01

Personalised computational models of the heart are of increasing interest for clinical applications due to their discriminative and predictive abilities. However, the simulation of a single heartbeat with a 3D cardiac electromechanical model can be long and computationally expensive, which makes some practical applications, such as the estimation of model parameters from clinical data (the personalisation), very slow. Here we introduce an original multifidelity approach between a 3D cardiac model and a simplified "0D" version of this model, which enables to get reliable (and extremely fast) approximations of the global behaviour of the 3D model using 0D simulations. We then use this multifidelity approximation to speed-up an efficient parameter estimation algorithm, leading to a fast and computationally efficient personalisation method of the 3D model. In particular, we show results on a cohort of 121 different heart geometries and measurements. Finally, an exploitable code of the 0D model with scripts to perform parameter estimation will be released to the community.
The EDIT-COMGEOM Code

DTIC Science & Technology

1975-09-01

This report assumes a familiarity with the GIFT and MAGIC computer codes. The EDIT-COMGEOM code is a FORTRAN computer code. The EDIT-COMGEOM code...converts the target description data which was used in the MAGIC computer code to the target description data which can be used in the GIFT computer code
Developing CORBA-Based Distributed Scientific Applications From Legacy Fortran Programs

NASA Technical Reports Server (NTRS)

Sang, Janche; Kim, Chan; Lopez, Isaac

2000-01-01

An efficient methodology is presented for integrating legacy applications written in Fortran into a distributed object framework. Issues and strategies regarding the conversion and decomposition of Fortran codes into Common Object Request Broker Architecture (CORBA) objects are discussed. Fortran codes are modified as little as possible as they are decomposed into modules and wrapped as objects. A new conversion tool takes the Fortran application as input and generates the C/C++ header file and Interface Definition Language (IDL) file. In addition, the performance of the client server computing is evaluated.
Development and Implementation of CFD-Informed Models for the Advanced Subchannel Code CTF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blyth, Taylor S.; Avramova, Maria

The research described in this PhD thesis contributes to the development of efficient methods for utilization of high-fidelity models and codes to inform low-fidelity models and codes in the area of nuclear reactor core thermal-hydraulics. The objective is to increase the accuracy of predictions of quantities of interests using high-fidelity CFD models while preserving the efficiency of low-fidelity subchannel core calculations. An original methodology named Physics- based Approach for High-to-Low Model Information has been further developed and tested. The overall physical phenomena and corresponding localized effects, which are introduced by the presence of spacer grids in light water reactor (LWR)more » cores, are dissected in corresponding four building basic processes, and corresponding models are informed using high-fidelity CFD codes. These models are a spacer grid-directed cross-flow model, a grid-enhanced turbulent mixing model, a heat transfer enhancement model, and a spacer grid pressure loss model. The localized CFD-models are developed and tested using the CFD code STAR-CCM+, and the corresponding global model development and testing in sub-channel formulation is performed in the thermal- hydraulic subchannel code CTF. The improved CTF simulations utilize data-files derived from CFD STAR-CCM+ simulation results covering the spacer grid design desired for inclusion in the CTF calculation. The current implementation of these models is examined and possibilities for improvement and further development are suggested. The validation experimental database is extended by including the OECD/NRC PSBT benchmark data. The outcome is an enhanced accuracy of CTF predictions while preserving the computational efficiency of a low-fidelity subchannel code.« less
Development and Implementation of CFD-Informed Models for the Advanced Subchannel Code CTF

NASA Astrophysics Data System (ADS)

Blyth, Taylor S.

The research described in this PhD thesis contributes to the development of efficient methods for utilization of high-fidelity models and codes to inform low-fidelity models and codes in the area of nuclear reactor core thermal-hydraulics. The objective is to increase the accuracy of predictions of quantities of interests using high-fidelity CFD models while preserving the efficiency of low-fidelity subchannel core calculations. An original methodology named Physics-based Approach for High-to-Low Model Information has been further developed and tested. The overall physical phenomena and corresponding localized effects, which are introduced by the presence of spacer grids in light water reactor (LWR) cores, are dissected in corresponding four building basic processes, and corresponding models are informed using high-fidelity CFD codes. These models are a spacer grid-directed cross-flow model, a grid-enhanced turbulent mixing model, a heat transfer enhancement model, and a spacer grid pressure loss model. The localized CFD-models are developed and tested using the CFD code STAR-CCM+, and the corresponding global model development and testing in sub-channel formulation is performed in the thermal-hydraulic subchannel code CTF. The improved CTF simulations utilize data-files derived from CFD STAR-CCM+ simulation results covering the spacer grid design desired for inclusion in the CTF calculation. The current implementation of these models is examined and possibilities for improvement and further development are suggested. The validation experimental database is extended by including the OECD/NRC PSBT benchmark data. The outcome is an enhanced accuracy of CTF predictions while preserving the computational efficiency of a low-fidelity subchannel code.
Volumetric Medical Image Coding: An Object-based, Lossy-to-lossless and Fully Scalable Approach

PubMed Central

Danyali, Habibiollah; Mertins, Alfred

2011-01-01

In this article, an object-based, highly scalable, lossy-to-lossless 3D wavelet coding approach for volumetric medical image data (e.g., magnetic resonance (MR) and computed tomography (CT)) is proposed. The new method, called 3DOBHS-SPIHT, is based on the well-known set partitioning in the hierarchical trees (SPIHT) algorithm and supports both quality and resolution scalability. The 3D input data is grouped into groups of slices (GOS) and each GOS is encoded and decoded as a separate unit. The symmetric tree definition of the original 3DSPIHT is improved by introducing a new asymmetric tree structure. While preserving the compression efficiency, the new tree structure allows for a small size of each GOS, which not only reduces memory consumption during the encoding and decoding processes, but also facilitates more efficient random access to certain segments of slices. To achieve more compression efficiency, the algorithm only encodes the main object of interest in each 3D data set, which can have any arbitrary shape, and ignores the unnecessary background. The experimental results on some MR data sets show the good performance of the 3DOBHS-SPIHT algorithm for multi-resolution lossy-to-lossless coding. The compression efficiency, full scalability, and object-based features of the proposed approach, beside its lossy-to-lossless coding support, make it a very attractive candidate for volumetric medical image information archiving and transmission applications. PMID:22606653
Aircraft Survivability. Susceptibility Reduction. Fall 2010

DTIC Science & Technology

2010-01-01

limits flexibility when issues are encountered during development. Once a program enters Engineering, Manufacturing, and Development (EMD), the...using a flexible , efficient computational environment based on a credible set of components. Unfortunately, current survivability codes contain many...approach limits flexibility when issues are encountered during development. Once a program enters Engineering Manufacturing and Development (EMD), the
Assessment of Effectiveness of Geologic Isolation Systems. Variable thickness transient ground-water flow model. Volume 2. Users' manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reisenauer, A.E.

1979-12-01

A system of computer codes to aid in the preparation and evaluation of ground-water model input, as well as in the computer codes and auxillary programs developed and adapted for use in modeling major ground-water aquifers is described. The ground-water model is interactive, rather than a batch-type model. Interactive models have been demonstrated to be superior to batch in the ground-water field. For example, looking through reams of numerical lists can be avoided with the much superior graphical output forms or summary type numerical output. The system of computer codes permits the flexibility to develop rapidly the model-required data filesmore » from engineering data and geologic maps, as well as efficiently manipulating the voluminous data generated. Central to these codes is the Ground-water Model, which given the boundary value problem, produces either the steady-state or transient time plane solutions. A sizeable part of the codes available provide rapid evaluation of the results. Besides contouring the new water potentials, the model allows graphical review of streamlines of flow, travel times, and detailed comparisons of surfaces or points at designated wells. Use of the graphics scopes provide immediate, but temporary displays which can be used for evaluation of input and output and which can be reproduced easily on hard copy devices, such as a line printer, Calcomp plotter and image photographs.« less
Water droplet impingement on airfoils and aircraft engine inlets for icing analysis

NASA Technical Reports Server (NTRS)

Papadakis, Michael; Elangovan, R.; Freund, George A., Jr.; Breer, Marlin D.

1991-01-01

This paper includes the results of a significant research program for verification of computer trajectory codes used in aircraft icing analysis. Experimental water droplet impingement data have been obtained in the NASA Lewis Research Center Icing Research Tunnel for a wide range of aircraft geometries and test conditions. The body whose impingement characteristics are required is covered at strategic locations by thin strips of moisture absorbing (blotter) paper and then exposed to an airstream containing a dyed-water spray cloud. Water droplet impingement data are extracted from the dyed blotter strips by measuring the optical reflectance of the dye deposit on the strips with an automated reflectometer. Impingement characteristics for all test geometries have also been calculated using two recently developed trajectory computer codes. Good agreement is obtained with experimental data. The experimental and analytical data show that maximum impingement efficiency and impingement limits increase with mean volumetric diameter for all geometries tested. For all inlet geometries tested, as the inlet mass flow is reduced, the maximum impingement efficiency is reduced and the location of the maximum impingement shifts toward the inlet inner cowl.
The HTM Spatial Pooler-A Neocortical Algorithm for Online Sparse Distributed Coding.

PubMed

Cui, Yuwei; Ahmad, Subutai; Hawkins, Jeff

2017-01-01

Hierarchical temporal memory (HTM) provides a theoretical framework that models several key computational principles of the neocortex. In this paper, we analyze an important component of HTM, the HTM spatial pooler (SP). The SP models how neurons learn feedforward connections and form efficient representations of the input. It converts arbitrary binary input patterns into sparse distributed representations (SDRs) using a combination of competitive Hebbian learning rules and homeostatic excitability control. We describe a number of key properties of the SP, including fast adaptation to changing input statistics, improved noise robustness through learning, efficient use of cells, and robustness to cell death. In order to quantify these properties we develop a set of metrics that can be directly computed from the SP outputs. We show how the properties are met using these metrics and targeted artificial simulations. We then demonstrate the value of the SP in a complete end-to-end real-world HTM system. We discuss the relationship with neuroscience and previous studies of sparse coding. The HTM spatial pooler represents a neurally inspired algorithm for learning sparse representations from noisy data streams in an online fashion.
Microsimulation Modeling for Health Decision Sciences Using R: A Tutorial.

PubMed

Krijkamp, Eline M; Alarid-Escudero, Fernando; Enns, Eva A; Jalal, Hawre J; Hunink, M G Myriam; Pechlivanoglou, Petros

2018-04-01

Microsimulation models are becoming increasingly common in the field of decision modeling for health. Because microsimulation models are computationally more demanding than traditional Markov cohort models, the use of computer programming languages in their development has become more common. R is a programming language that has gained recognition within the field of decision modeling. It has the capacity to perform microsimulation models more efficiently than software commonly used for decision modeling, incorporate statistical analyses within decision models, and produce more transparent models and reproducible results. However, no clear guidance for the implementation of microsimulation models in R exists. In this tutorial, we provide a step-by-step guide to build microsimulation models in R and illustrate the use of this guide on a simple, but transferable, hypothetical decision problem. We guide the reader through the necessary steps and provide generic R code that is flexible and can be adapted for other models. We also show how this code can be extended to address more complex model structures and provide an efficient microsimulation approach that relies on vectorization solutions.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
A user's manual for the Electromagnetic Surface Patch code: ESP version 3

NASA Technical Reports Server (NTRS)

Newman, E. H.; Dilsavor, R. L.

1987-01-01

This report serves as a user's manual for Version III of the Electromagnetic Surface Patch Code or ESP code. ESP is user-oriented, based on the method of moments (MM) for treating geometries consisting of an interconnection of thin wires and perfectly conducting polygonal plates. Wire/plate junctions must be about 0.1 lambda or more from any plate edge. Several plates may intersect along a common edge. Excitation may be by either a delta-gap voltage generator or by a plane wave. The thin wires may have finite conductivity and also may contain lumped loads. The code computes most of the usual quantities of interest such as current distribution, input impedance, radiation efficiency, mutual coupling, far zone gain patterns (both polarizations) and radar-cross-section (both/cross polarizations).
Development of an Object-Oriented Turbomachinery Analysis Code within the NPSS Framework

NASA Technical Reports Server (NTRS)

Jones, Scott M.

2014-01-01

During the preliminary or conceptual design phase of an aircraft engine, the turbomachinery designer has a need to estimate the effects of a large number of design parameters such as flow size, stage count, blade count, radial position, etc. on the weight and efficiency of a turbomachine. Computer codes are invariably used to perform this task however, such codes are often very old, written in outdated languages with arcane input files, and rarely adaptable to new architectures or unconventional layouts. Given the need to perform these kinds of preliminary design trades, a modern 2-D turbomachinery design and analysis code has been written using the Numerical Propulsion System Simulation (NPSS) framework. This paper discusses the development of the governing equations and the structure of the primary objects used in OTAC.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mendygral, P. J.; Radcliffe, N.; Kandalla, K.

2017-02-01

We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Nonlinear to Linear Elastic Code Coupling in 2-D Axisymmetric Media.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Preston, Leiph

Explosions within the earth nonlinearly deform the local media, but at typical seismological observation distances, the seismic waves can be considered linear. Although nonlinear algorithms can simulate explosions in the very near field well, these codes are computationally expensive and inaccurate at propagating these signals to great distances. A linearized wave propagation code, coupled to a nonlinear code, provides an efficient mechanism to both accurately simulate the explosion itself and to propagate these signals to distant receivers. To this end we have coupled Sandia's nonlinear simulation algorithm CTH to a linearized elastic wave propagation code for 2-D axisymmetric media (axiElasti)more » by passing information from the nonlinear to the linear code via time-varying boundary conditions. In this report, we first develop the 2-D axisymmetric elastic wave equations in cylindrical coordinates. Next we show how we design the time-varying boundary conditions passing information from CTH to axiElasti, and finally we demonstrate the coupling code via a simple study of the elastic radius.« less
Quantum Monte Carlo Endstation for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lubos Mitas

2011-01-26

NCSU research group has been focused on accomplising the key goals of this initiative: establishing new generation of quantum Monte Carlo (QMC) computational tools as a part of Endstation petaflop initiative for use at the DOE ORNL computational facilities and for use by computational electronic structure community at large; carrying out high accuracy quantum Monte Carlo demonstration projects in application of these tools to the forefront electronic structure problems in molecular and solid systems; expanding the impact of QMC methods and approaches; explaining and enhancing the impact of these advanced computational approaches. In particular, we have developed quantum Monte Carlomore » code (QWalk, www.qwalk.org) which was significantly expanded and optimized using funds from this support and at present became an actively used tool in the petascale regime by ORNL researchers and beyond. These developments have been built upon efforts undertaken by the PI's group and collaborators over the period of the last decade. The code was optimized and tested extensively on a number of parallel architectures including petaflop ORNL Jaguar machine. We have developed and redesigned a number of code modules such as evaluation of wave functions and orbitals, calculations of pfaffians and introduction of backflow coordinates together with overall organization of the code and random walker distribution over multicore architectures. We have addressed several bottlenecks such as load balancing and verified efficiency and accuracy of the calculations with the other groups of the Endstation team. The QWalk package contains about 50,000 lines of high quality object-oriented C++ and includes also interfaces to data files from other conventional electronic structure codes such as Gamess, Gaussian, Crystal and others. This grant supported PI for one month during summers, a full-time postdoc and partially three graduate students over the period of the grant duration, it has resulted in 13 published papers, 15 invited talks and lectures nationally and internationally. My former graduate student and postdoc Dr. Michal Bajdich, who was supported byt this grant, is currently a postdoc with ORNL in the group of Dr. F. Reboredo and Dr. P. Kent and is using the developed tools in a number of DOE projects. The QWalk package has become a truly important research tool used by the electronic structure community and has attracted several new developers in other research groups. Our tools use several types of correlated wavefunction approaches, variational, diffusion and reptation methods, large-scale optimization methods for wavefunctions and enables to calculate energy differences such as cohesion, electronic gaps, but also densities and other properties, using multiple runs one can obtain equations of state for given structures and beyond. Our codes use efficient numerical and Monte Carlo strategies (high accuracy numerical orbitals, multi-reference wave functions, highly accurate correlation factors, pairing orbitals, force biased and correlated sampling Monte Carlo), are robustly parallelized and enable to run on tens of thousands cores very efficiently. Our demonstration applications were focused on the challenging research problems in several fields of materials science such as transition metal solids. We note that our study of FeO solid was the first QMC calculation of transition metal oxides at high pressures.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.