From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
Fundamental differences between optimization code test problems in engineering applications
NASA Technical Reports Server (NTRS)
Eason, E. D.
1984-01-01
The purpose here is to suggest that there is at least one fundamental difference between the problems used for testing optimization codes and the problems that engineers often need to solve; in particular, the level of precision that can be practically achieved in the numerical evaluation of the objective function, derivatives, and constraints. This difference affects the performance of optimization codes, as illustrated by two examples. Two classes of optimization problem were defined. Class One functions and constraints can be evaluated to a high precision that depends primarily on the word length of the computer. Class Two functions and/or constraints can only be evaluated to a moderate or a low level of precision for economic or modeling reasons, regardless of the computer word length. Optimization codes have not been adequately tested on Class Two problems. There are very few Class Two test problems in the literature, while there are literally hundreds of Class One test problems. The relative performance of two codes may be markedly different for Class One and Class Two problems. Less sophisticated direct search type codes may be less likely to be confused or to waste many function evaluations on Class Two problems. The analysis accuracy and minimization performance are related in a complex way that probably varies from code to code. On a problem where the analysis precision was varied over a range, the simple Hooke and Jeeves code was more efficient at low precision while the Powell code was more efficient at high precision.
Multidisciplinary High-Fidelity Analysis and Optimization of Aerospace Vehicles. Part 1; Formulation
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Townsend, J. C.; Salas, A. O.; Samareh, J. A.; Mukhopadhyay, V.; Barthelemy, J.-F.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a highspeed civil transport configuration. The paper describes the engineering aspects of formulating the optimization by integrating these analysis codes and associated interface codes into the system. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture (CORBA) compliant software product. A companion paper presents currently available results.
Novel Integration of Frame Rate Up Conversion and HEVC Coding Based on Rate-Distortion Optimization.
Guo Lu; Xiaoyun Zhang; Li Chen; Zhiyong Gao
2018-02-01
Frame rate up conversion (FRUC) can improve the visual quality by interpolating new intermediate frames. However, high frame rate videos by FRUC are confronted with more bitrate consumption or annoying artifacts of interpolated frames. In this paper, a novel integration framework of FRUC and high efficiency video coding (HEVC) is proposed based on rate-distortion optimization, and the interpolated frames can be reconstructed at encoder side with low bitrate cost and high visual quality. First, joint motion estimation (JME) algorithm is proposed to obtain robust motion vectors, which are shared between FRUC and video coding. What's more, JME is embedded into the coding loop and employs the original motion search strategy in HEVC coding. Then, the frame interpolation is formulated as a rate-distortion optimization problem, where both the coding bitrate consumption and visual quality are taken into account. Due to the absence of original frames, the distortion model for interpolated frames is established according to the motion vector reliability and coding quantization error. Experimental results demonstrate that the proposed framework can achieve 21% ~ 42% reduction in BDBR, when compared with the traditional methods of FRUC cascaded with coding.
Diversity-optimal power loading for intensity modulated MIMO optical wireless communications.
Zhang, Yan-Yu; Yu, Hong-Yi; Zhang, Jian-Kang; Zhu, Yi-Jun
2016-04-18
In this paper, we consider the design of space code for an intensity modulated direct detection multi-input-multi-output optical wireless communication (IM/DD MIMO-OWC) system, in which channel coefficients are independent and non-identically log-normal distributed, with variances and means known at the transmitter and channel state information available at the receiver. Utilizing the existing space code design criterion for IM/DD MIMO-OWC with a maximum likelihood (ML) detector, we design a diversity-optimal space code (DOSC) that maximizes both large-scale diversity and small-scale diversity gains and prove that the spatial repetition code (RC) with a diversity-optimized power allocation is diversity-optimal among all the high dimensional nonnegative space code schemes under a commonly used optical power constraint. In addition, we show that one of significant advantages of the DOSC is to allow low-complexity ML detection. Simulation results indicate that in high signal-to-noise ratio (SNR) regimes, our proposed DOSC significantly outperforms RC, which is the best space code currently available for such system.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Multidisciplinary Analysis and Optimal Design: As Easy as it Sounds?
NASA Technical Reports Server (NTRS)
Moore, Greg; Chainyk, Mike; Schiermeier, John
2004-01-01
The viewgraph presentation examines optimal design for precision, large aperture structures. Discussion focuses on aspects of design optimization, code architecture and current capabilities, and planned activities and collaborative area suggestions. The discussion of design optimization examines design sensitivity analysis; practical considerations; and new analytical environments including finite element-based capability for high-fidelity multidisciplinary analysis, design sensitivity, and optimization. The discussion of code architecture and current capabilities includes basic thermal and structural elements, nonlinear heat transfer solutions and process, and optical modes generation.
A stimulus-dependent spike threshold is an optimal neural coder
Jones, Douglas L.; Johnson, Erik C.; Ratnam, Rama
2015-01-01
A neural code based on sequences of spikes can consume a significant portion of the brain's energy budget. Thus, energy considerations would dictate that spiking activity be kept as low as possible. However, a high spike-rate improves the coding and representation of signals in spike trains, particularly in sensory systems. These are competing demands, and selective pressure has presumably worked to optimize coding by apportioning a minimum number of spikes so as to maximize coding fidelity. The mechanisms by which a neuron generates spikes while maintaining a fidelity criterion are not known. Here, we show that a signal-dependent neural threshold, similar to a dynamic or adapting threshold, optimizes the trade-off between spike generation (encoding) and fidelity (decoding). The threshold mimics a post-synaptic membrane (a low-pass filter) and serves as an internal decoder. Further, it sets the average firing rate (the energy constraint). The decoding process provides an internal copy of the coding error to the spike-generator which emits a spike when the error equals or exceeds a spike threshold. When optimized, the trade-off leads to a deterministic spike firing-rule that generates optimally timed spikes so as to maximize fidelity. The optimal coder is derived in closed-form in the limit of high spike-rates, when the signal can be approximated as a piece-wise constant signal. The predicted spike-times are close to those obtained experimentally in the primary electrosensory afferent neurons of weakly electric fish (Apteronotus leptorhynchus) and pyramidal neurons from the somatosensory cortex of the rat. We suggest that KCNQ/Kv7 channels (underlying the M-current) are good candidates for the decoder. They are widely coupled to metabolic processes and do not inactivate. We conclude that the neural threshold is optimized to generate an energy-efficient and high-fidelity neural code. PMID:26082710
Next-generation acceleration and code optimization for light transport in turbid media using GPUs
Alerstam, Erik; Lo, William Chun Yip; Han, Tianyi David; Rose, Jonathan; Andersson-Engels, Stefan; Lilge, Lothar
2010-01-01
A highly optimized Monte Carlo (MC) code package for simulating light transport is developed on the latest graphics processing unit (GPU) built for general-purpose computing from NVIDIA - the Fermi GPU. In biomedical optics, the MC method is the gold standard approach for simulating light transport in biological tissue, both due to its accuracy and its flexibility in modelling realistic, heterogeneous tissue geometry in 3-D. However, the widespread use of MC simulations in inverse problems, such as treatment planning for PDT, is limited by their long computation time. Despite its parallel nature, optimizing MC code on the GPU has been shown to be a challenge, particularly when the sharing of simulation result matrices among many parallel threads demands the frequent use of atomic instructions to access the slow GPU global memory. This paper proposes an optimization scheme that utilizes the fast shared memory to resolve the performance bottleneck caused by atomic access, and discusses numerous other optimization techniques needed to harness the full potential of the GPU. Using these techniques, a widely accepted MC code package in biophotonics, called MCML, was successfully accelerated on a Fermi GPU by approximately 600x compared to a state-of-the-art Intel Core i7 CPU. A skin model consisting of 7 layers was used as the standard simulation geometry. To demonstrate the possibility of GPU cluster computing, the same GPU code was executed on four GPUs, showing a linear improvement in performance with an increasing number of GPUs. The GPU-based MCML code package, named GPU-MCML, is compatible with a wide range of graphics cards and is released as an open-source software in two versions: an optimized version tuned for high performance and a simplified version for beginners (http://code.google.com/p/gpumcml). PMID:21258498
Overall Traveling-Wave-Tube Efficiency Improved By Optimized Multistage Depressed Collector Design
NASA Technical Reports Server (NTRS)
Vaden, Karl R.
2002-01-01
Depressed Collector Design The microwave traveling wave tube (TWT) is used widely for space communications and high-power airborne transmitting sources. One of the most important features in designing a TWT is overall efficiency. Yet, overall TWT efficiency is strongly dependent on the efficiency of the electron beam collector, particularly for high values of collector efficiency. For these reasons, the NASA Glenn Research Center developed an optimization algorithm based on simulated annealing to quickly design highly efficient multistage depressed collectors (MDC's). Simulated annealing is a strategy for solving highly nonlinear combinatorial optimization problems. Its major advantage over other methods is its ability to avoid becoming trapped in local minima. Simulated annealing is based on an analogy to statistical thermodynamics, specifically the physical process of annealing: heating a material to a temperature that permits many atomic rearrangements and then cooling it carefully and slowly, until it freezes into a strong, minimum-energy crystalline structure. This minimum energy crystal corresponds to the optimal solution of a mathematical optimization problem. The TWT used as a baseline for optimization was the 32-GHz, 10-W, helical TWT developed for the Cassini mission to Saturn. The method of collector analysis and design used was a 2-1/2-dimensional computational procedure that employs two types of codes, a large signal analysis code and an electron trajectory code. The large signal analysis code produces the spatial, energetic, and temporal distributions of the spent beam entering the MDC. An electron trajectory code uses the resultant data to perform the actual collector analysis. The MDC was optimized for maximum MDC efficiency and minimum final kinetic energy of all collected electrons (to reduce heat transfer). The preceding figure shows the geometric and electrical configuration of an optimized collector with an efficiency of 93.8 percent. The results show the improvement in collector efficiency from 89.7 to 93.8 percent, resulting in an increase of three overall efficiency points. In addition, the time to design a highly efficient MDC was reduced from a month to a few days. All work was done in-house at Glenn for the High Rate Data Delivery Program. Future plans include optimizing the MDC and TWT interaction circuit in tandem to further improve overall TWT efficiency.
FBCOT: a fast block coding option for JPEG 2000
NASA Astrophysics Data System (ADS)
Taubman, David; Naman, Aous; Mathew, Reji
2017-09-01
Based on the EBCOT algorithm, JPEG 2000 finds application in many fields, including high performance scientific, geospatial and video coding applications. Beyond digital cinema, JPEG 2000 is also attractive for low-latency video communications. The main obstacle for some of these applications is the relatively high computational complexity of the block coder, especially at high bit-rates. This paper proposes a drop-in replacement for the JPEG 2000 block coding algorithm, achieving much higher encoding and decoding throughputs, with only modest loss in coding efficiency (typically < 0.5dB). The algorithm provides only limited quality/SNR scalability, but offers truly reversible transcoding to/from any standard JPEG 2000 block bit-stream. The proposed FAST block coder can be used with EBCOT's post-compression RD-optimization methodology, allowing a target compressed bit-rate to be achieved even at low latencies, leading to the name FBCOT (Fast Block Coding with Optimized Truncation).
Dai, Wenrui; Xiong, Hongkai; Jiang, Xiaoqian; Chen, Chang Wen
2014-01-01
This paper proposes a novel model on intra coding for High Efficiency Video Coding (HEVC), which simultaneously predicts blocks of pixels with optimal rate distortion. It utilizes the spatial statistical correlation for the optimal prediction based on 2-D contexts, in addition to formulating the data-driven structural interdependences to make the prediction error coherent with the probability distribution, which is desirable for successful transform and coding. The structured set prediction model incorporates a max-margin Markov network (M3N) to regulate and optimize multiple block predictions. The model parameters are learned by discriminating the actual pixel value from other possible estimates to maximize the margin (i.e., decision boundary bandwidth). Compared to existing methods that focus on minimizing prediction error, the M3N-based model adaptively maintains the coherence for a set of predictions. Specifically, the proposed model concurrently optimizes a set of predictions by associating the loss for individual blocks to the joint distribution of succeeding discrete cosine transform coefficients. When the sample size grows, the prediction error is asymptotically upper bounded by the training error under the decomposable loss function. As an internal step, we optimize the underlying Markov network structure to find states that achieve the maximal energy using expectation propagation. For validation, we integrate the proposed model into HEVC for optimal mode selection on rate-distortion optimization. The proposed prediction model obtains up to 2.85% bit rate reduction and achieves better visual quality in comparison to the HEVC intra coding. PMID:25505829
Bilayer Protograph Codes for Half-Duplex Relay Channels
NASA Technical Reports Server (NTRS)
Divsalar, Dariush; VanNguyen, Thuy; Nosratinia, Aria
2013-01-01
Direct to Earth return links are limited by the size and power of lander devices. A standard alternative is provided by a two-hops return link: a proximity link (from lander to orbiter relay) and a deep-space link (from orbiter relay to Earth). Although direct to Earth return links are limited by the size and power of lander devices, using an additional link and a proposed coding for relay channels, one can obtain a more reliable signal. Although significant progress has been made in the relay coding problem, existing codes must be painstakingly optimized to match to a single set of channel conditions, many of them do not offer easy encoding, and most of them do not have structured design. A high-performing LDPC (low-density parity-check) code for the relay channel addresses simultaneously two important issues: a code structure that allows low encoding complexity, and a flexible rate-compatible code that allows matching to various channel conditions. Most of the previous high-performance LDPC codes for the relay channel are tightly optimized for a given channel quality, and are not easily adapted without extensive re-optimization for various channel conditions. This code for the relay channel combines structured design and easy encoding with rate compatibility to allow adaptation to the three links involved in the relay channel, and furthermore offers very good performance. The proposed code is constructed by synthesizing a bilayer structure with a pro to graph. In addition to the contribution to relay encoding, an improved family of protograph codes was produced for the point-to-point AWGN (additive white Gaussian noise) channel whose high-rate members enjoy thresholds that are within 0.07 dB of capacity. These LDPC relay codes address three important issues in an integrative manner: low encoding complexity, modular structure allowing for easy design, and rate compatibility so that the code can be easily matched to a variety of channel conditions without extensive re-optimization. The main problem of half-duplex relay coding can be reduced to the simultaneous design of two codes at two rates and two SNRs (signal-to-noise ratios), such that one is a subset of the other. This problem can be addressed by forceful optimization, but a clever method of addressing this problem is via the bilayer lengthened (BL) LDPC structure. This method uses a bilayer Tanner graph to make the two codes while using a concept of "parity forwarding" with subsequent successive decoding that removes the need to directly address the issue of uneven SNRs among the symbols of a given codeword. This method is attractive in that it addresses some of the main issues in the design of relay codes, but it does not by itself give rise to highly structured codes with simple encoding, nor does it give rate-compatible codes. The main contribution of this work is to construct a class of codes that simultaneously possess a bilayer parity- forwarding mechanism, while also benefiting from the properties of protograph codes having an easy encoding, a modular design, and being a rate-compatible code.
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Weston, R. P.; Samareh, J. A.; Mason, B. H.; Green, L. L.; Biedron, R. T.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity finite-element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a high-speed civil transport configuration. The paper describes both the preliminary results from implementing and validating the multidisciplinary analysis and the results from an aerodynamic optimization. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture compliant software product. A companion paper describes the formulation of the multidisciplinary analysis and optimization system.
An integrated optimum design approach for high speed prop rotors
NASA Technical Reports Server (NTRS)
Chattopadhyay, Aditi; Mccarthy, Thomas R.
1995-01-01
The objective is to develop an optimization procedure for high-speed and civil tilt-rotors by coupling all of the necessary disciplines within a closed-loop optimization procedure. Both simplified and comprehensive analysis codes are used for the aerodynamic analyses. The structural properties are calculated using in-house developed algorithms for both isotropic and composite box beam sections. There are four major objectives of this study. (1) Aerodynamic optimization: The effects of blade aerodynamic characteristics on cruise and hover performance of prop-rotor aircraft are investigated using the classical blade element momentum approach with corrections for the high lift capability of rotors/propellers. (2) Coupled aerodynamic/structures optimization: A multilevel hybrid optimization technique is developed for the design of prop-rotor aircraft. The design problem is decomposed into a level for improved aerodynamics with continuous design variables and a level with discrete variables to investigate composite tailoring. The aerodynamic analysis is based on that developed in objective 1 and the structural analysis is performed using an in-house code which models a composite box beam. The results are compared to both a reference rotor and the optimum rotor found in the purely aerodynamic formulation. (3) Multipoint optimization: The multilevel optimization procedure of objective 2 is extended to a multipoint design problem. Hover, cruise, and take-off are the three flight conditions simultaneously maximized. (4) Coupled rotor/wing optimization: Using the comprehensive rotary wing code CAMRAD, an optimization procedure is developed for the coupled rotor/wing performance in high speed tilt-rotor aircraft. The developed procedure contains design variables which define the rotor and wing planforms.
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code
Mendis, Charith; Bosboom, Jeffrey; Wu, Kevin; ...
2015-06-03
Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regeneratemore » the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.« less
Gschwind, Michael K
2013-07-23
Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.
Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.
2015-05-01
The most widely used community weather forecast and research model in the world is the Weather Research and Forecast (WRF) model. Two distinct varieties of WRF exist. The one we are interested is the Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we optimize a meridional (north-south direction) advection subroutine for Intel Xeon Phi coprocessor. Advection is of the most time consuming routines in the ARW dynamics core. It advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.2x.
How unrealistic optimism is maintained in the face of reality.
Sharot, Tali; Korn, Christoph W; Dolan, Raymond J
2011-10-09
Unrealistic optimism is a pervasive human trait that influences domains ranging from personal relationships to politics and finance. How people maintain unrealistic optimism, despite frequently encountering information that challenges those biased beliefs, is unknown. We examined this question and found a marked asymmetry in belief updating. Participants updated their beliefs more in response to information that was better than expected than to information that was worse. This selectivity was mediated by a relative failure to code for errors that should reduce optimism. Distinct regions of the prefrontal cortex tracked estimation errors when those called for positive update, both in individuals who scored high and low on trait optimism. However, highly optimistic individuals exhibited reduced tracking of estimation errors that called for negative update in right inferior prefrontal gyrus. These findings indicate that optimism is tied to a selective update failure and diminished neural coding of undesirable information regarding the future.
Large-Signal Code TESLA: Current Status and Recent Development
2008-04-01
K.Eppley, J.J.Petillo, “ High - power four cavity S - band multiple- beam klystron design”, IEEE Trans. Plasma Sci. , vol. 32, pp. 1119-1135, June 2004. 4...advances in the development of the large-signal code TESLA, mainly used for the modeling of high - power single- beam and multiple-beam klystron ...amplifiers. Keywords: large-signal code; multiple-beam klystrons ; serial and parallel versions. Introduction The optimization and design of new high power
Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU
NASA Astrophysics Data System (ADS)
Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.
1982-06-01
In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.
Throughput Optimization Via Adaptive MIMO Communications
2006-05-30
End-to-end matlab packet simulation platform. * Low density parity check code (LDPCC). * Field trials with Silvus DSP MIMO testbed. * High mobility...incorporate advanced LDPC (low density parity check) codes . Realizing that the power of LDPC codes come at the price of decoder complexity, we also...Channel Coding Binary Convolution Code or LDPC Packet Length 0 - 216-1, bytes Coding Rate 1/2, 2/3, 3/4, 5/6 MIMO Channel Training Length 0 - 4, symbols
Multi-point optimization of recirculation flow type casing treatment in centrifugal compressors
NASA Astrophysics Data System (ADS)
Tun, Min Thaw; Sakaguchi, Daisaku
2016-06-01
High-pressure ratio and wide operating range are highly required for a turbocharger in diesel engines. A recirculation flow type casing treatment is effective for flow range enhancement of centrifugal compressors. Two ring grooves on a suction pipe and a shroud casing wall are connected by means of an annular passage and stable recirculation flow is formed at small flow rates from the downstream groove toward the upstream groove through the annular bypass. The shape of baseline recirculation flow type casing is modified and optimized by using a multi-point optimization code with a metamodel assisted evolutionary algorithm embedding a commercial CFD code CFX from ANSYS. The numerical optimization results give the optimized design of casing with improving adiabatic efficiency in wide operating flow rate range. Sensitivity analysis of design parameters as a function of efficiency has been performed. It is found that the optimized casing design provides optimized recirculation flow rate, in which an increment of entropy rise is minimized at grooves and passages of the rotating impeller.
A grid generation system for multi-disciplinary design optimization
NASA Technical Reports Server (NTRS)
Jones, William T.; Samareh-Abolhassani, Jamshid
1995-01-01
A general multi-block three-dimensional volume grid generator is presented which is suitable for Multi-Disciplinary Design Optimization. The code is timely, robust, highly automated, and written in ANSI 'C' for platform independence. Algebraic techniques are used to generate and/or modify block face and volume grids to reflect geometric changes resulting from design optimization. Volume grids are generated/modified in a batch environment and controlled via an ASCII user input deck. This allows the code to be incorporated directly into the design loop. Generated volume grids are presented for a High Speed Civil Transport (HSCT) Wing/Body geometry as well a complex HSCT configuration including horizontal and vertical tails, engine nacelles and pylons, and canard surfaces.
Efficient Network Coding-Based Loss Recovery for Reliable Multicast in Wireless Networks
NASA Astrophysics Data System (ADS)
Chi, Kaikai; Jiang, Xiaohong; Ye, Baoliu; Horiguchi, Susumu
Recently, network coding has been applied to the loss recovery of reliable multicast in wireless networks [19], where multiple lost packets are XOR-ed together as one packet and forwarded via single retransmission, resulting in a significant reduction of bandwidth consumption. In this paper, we first prove that maximizing the number of lost packets for XOR-ing, which is the key part of the available network coding-based reliable multicast schemes, is actually a complex NP-complete problem. To address this limitation, we then propose an efficient heuristic algorithm for finding an approximately optimal solution of this optimization problem. Furthermore, we show that the packet coding principle of maximizing the number of lost packets for XOR-ing sometimes cannot fully exploit the potential coding opportunities, and we then further propose new heuristic-based schemes with a new coding principle. Simulation results demonstrate that the heuristic-based schemes have very low computational complexity and can achieve almost the same transmission efficiency as the current coding-based high-complexity schemes. Furthermore, the heuristic-based schemes with the new coding principle not only have very low complexity, but also slightly outperform the current high-complexity ones.
High-Speed, Low-Cost Workstation for Computation-Intensive Statistics. Phase 1
1990-06-20
routine implementation and performance. 5 The two compiled versions given in the table were coded in an attempt to obtain an optimized compiled version...level statistics and linear algebra routines (BSAS and BLAS) that have been prototyped in this study. For each routine, both the C code ( Turbo C...OISTRIBUTION /AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Unlimited distribution 13. ABSTRACT (Maximum 200 words) High-performance and low-cost
Improving soft FEC performance for higher-order modulations via optimized bit channel mappings.
Häger, Christian; Amat, Alexandre Graell I; Brännström, Fredrik; Alvarado, Alex; Agrell, Erik
2014-06-16
Soft forward error correction with higher-order modulations is often implemented in practice via the pragmatic bit-interleaved coded modulation paradigm, where a single binary code is mapped to a nonbinary modulation. In this paper, we study the optimization of the mapping of the coded bits to the modulation bits for a polarization-multiplexed fiber-optical system without optical inline dispersion compensation. Our focus is on protograph-based low-density parity-check (LDPC) codes which allow for an efficient hardware implementation, suitable for high-speed optical communications. The optimization is applied to the AR4JA protograph family, and further extended to protograph-based spatially coupled LDPC codes assuming a windowed decoder. Full field simulations via the split-step Fourier method are used to verify the analysis. The results show performance gains of up to 0.25 dB, which translate into a possible extension of the transmission reach by roughly up to 8%, without significantly increasing the system complexity.
Klann, Jeffrey G; Phillips, Lori C; Turchin, Alexander; Weiler, Sarah; Mandl, Kenneth D; Murphy, Shawn N
2015-12-11
Interoperable phenotyping algorithms, needed to identify patient cohorts meeting eligibility criteria for observational studies or clinical trials, require medical data in a consistent structured, coded format. Data heterogeneity limits such algorithms' applicability. Existing approaches are often: not widely interoperable; or, have low sensitivity due to reliance on the lowest common denominator (ICD-9 diagnoses). In the Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) we endeavor to use the widely-available Current Procedural Terminology (CPT) procedure codes with ICD-9. Unfortunately, CPT changes drastically year-to-year - codes are retired/replaced. Longitudinal analysis requires grouping retired and current codes. BioPortal provides a navigable CPT hierarchy, which we imported into the Informatics for Integrating Biology and the Bedside (i2b2) data warehouse and analytics platform. However, this hierarchy does not include retired codes. We compared BioPortal's 2014AA CPT hierarchy with Partners Healthcare's SCILHS datamart, comprising three-million patients' data over 15 years. 573 CPT codes were not present in 2014AA (6.5 million occurrences). No existing terminology provided hierarchical linkages for these missing codes, so we developed a method that automatically places missing codes in the most specific "grouper" category, using the numerical similarity of CPT codes. Two informaticians reviewed the results. We incorporated the final table into our i2b2 SCILHS/PCORnet ontology, deployed it at seven sites, and performed a gap analysis and an evaluation against several phenotyping algorithms. The reviewers found the method placed the code correctly with 97 % precision when considering only miscategorizations ("correctness precision") and 52 % precision using a gold-standard of optimal placement ("optimality precision"). High correctness precision meant that codes were placed in a reasonable hierarchal position that a reviewer can quickly validate. Lower optimality precision meant that codes were not often placed in the optimal hierarchical subfolder. The seven sites encountered few occurrences of codes outside our ontology, 93 % of which comprised just four codes. Our hierarchical approach correctly grouped retired and non-retired codes in most cases and extended the temporal reach of several important phenotyping algorithms. We developed a simple, easily-validated, automated method to place retired CPT codes into the BioPortal CPT hierarchy. This complements existing hierarchical terminologies, which do not include retired codes. The approach's utility is confirmed by the high correctness precision and successful grouping of retired with non-retired codes.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
Optimal Near-Hitless Network Failure Recovery Using Diversity Coding
ERIC Educational Resources Information Center
Avci, Serhat Nazim
2013-01-01
Link failures in wide area networks are common and cause significant data losses. Mesh-based protection schemes offer high capacity efficiency but they are slow, require complex signaling, and instable. Diversity coding is a proactive coding-based recovery technique which offers near-hitless (sub-ms) restoration with a competitive spare capacity…
Fusion PIC code performance analysis on the Cori KNL system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Deslippe, Jack; Friesen, Brian
We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization ismore » shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek
2016-10-06
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Computer code for the optimization of performance parameters of mixed explosive formulations.
Muthurajan, H; Sivabalan, R; Talawar, M B; Venugopalan, S; Gandhe, B R
2006-08-25
LOTUSES is a novel computer code, which has been developed for the prediction of various thermodynamic properties such as heat of formation, heat of explosion, volume of explosion gaseous products and other related performance parameters. In this paper, we report LOTUSES (Version 1.4) code which has been utilized for the optimization of various high explosives in different combinations to obtain maximum possible velocity of detonation. LOTUSES (Version 1.4) code will vary the composition of mixed explosives automatically in the range of 1-100% and computes the oxygen balance as well as the velocity of detonation for various compositions in preset steps. Further, the code suggests the compositions for which least oxygen balance and the higher velocity of detonation could be achieved. Presently, the code can be applied for two component explosive compositions. The code has been validated with well-known explosives like, TNT, HNS, HNF, TATB, RDX, HMX, AN, DNA, CL-20 and TNAZ in different combinations. The new algorithm incorporated in LOTUSES (Version 1.4) enhances the efficiency and makes it a more powerful tool for the scientists/researches working in the field of high energy materials/hazardous materials.
NASA Astrophysics Data System (ADS)
Imtiaz, Waqas A.; Ilyas, M.; Khan, Yousaf
2016-11-01
This paper propose a new code to optimize the performance of spectral amplitude coding-optical code division multiple access (SAC-OCDMA) system. The unique two-matrix structure of the proposed enhanced multi diagonal (EMD) code and effective correlation properties, between intended and interfering subscribers, significantly elevates the performance of SAC-OCDMA system by negating multiple access interference (MAI) and associated phase induce intensity noise (PIIN). Performance of SAC-OCDMA system based on the proposed code is thoroughly analyzed for two detection techniques through analytic and simulation analysis by referring to bit error rate (BER), signal to noise ratio (SNR) and eye patterns at the receiving end. It is shown that EMD code while using SDD technique provides high transmission capacity, reduces the receiver complexity, and provides better performance as compared to complementary subtraction detection (CSD) technique. Furthermore, analysis shows that, for a minimum acceptable BER of 10-9 , the proposed system supports 64 subscribers at data rates of up to 2 Gbps for both up-down link transmission.
NASA Astrophysics Data System (ADS)
Wang, Jianhua; Cheng, Lianglun; Wang, Tao; Peng, Xiaodong
2016-03-01
Table look-up operation plays a very important role during the decoding processing of context-based adaptive variable length decoding (CAVLD) in H.264/advanced video coding (AVC). However, frequent table look-up operation can result in big table memory access, and then lead to high table power consumption. Aiming to solve the problem of big table memory access of current methods, and then reduce high power consumption, a memory-efficient table look-up optimized algorithm is presented for CAVLD. The contribution of this paper lies that index search technology is introduced to reduce big memory access for table look-up, and then reduce high table power consumption. Specifically, in our schemes, we use index search technology to reduce memory access by reducing the searching and matching operations for code_word on the basis of taking advantage of the internal relationship among length of zero in code_prefix, value of code_suffix and code_lengh, thus saving the power consumption of table look-up. The experimental results show that our proposed table look-up algorithm based on index search can lower about 60% memory access consumption compared with table look-up by sequential search scheme, and then save much power consumption for CAVLD in H.264/AVC.
NASA Astrophysics Data System (ADS)
Pandremmenou, Katerina; Kondi, Lisimachos P.; Parsopoulos, Konstantinos E.
2012-01-01
Surveillance applications usually require high levels of video quality, resulting in high power consumption. The existence of a well-behaved scheme to balance video quality and power consumption is crucial for the system's performance. In the present work, we adopt the game-theoretic approach of Kalai-Smorodinsky Bargaining Solution (KSBS) to deal with the problem of optimal resource allocation in a multi-node wireless visual sensor network (VSN). In our setting, the Direct Sequence Code Division Multiple Access (DS-CDMA) method is used for channel access, while a cross-layer optimization design, which employs a central processing server, accounts for the overall system efficacy through all network layers. The task assigned to the central server is the communication with the nodes and the joint determination of their transmission parameters. The KSBS is applied to non-convex utility spaces, efficiently distributing the source coding rate, channel coding rate and transmission powers among the nodes. In the underlying model, the transmission powers assume continuous values, whereas the source and channel coding rates can take only discrete values. Experimental results are reported and discussed to demonstrate the merits of KSBS over competing policies.
Santos, José; Monteagudo, Ángel
2017-03-27
The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutation adverse effect as the main selection factor in the evolution of the code. We have used simulated evolution in a computer to search for optimized codes, which helps to obtain information about the optimization level of the canonical code in its evolution. A genetic algorithm searches for efficient codes in a fitness landscape that corresponds with the adaptability of possible hypothetical genetic codes. The lower the effects of errors or mutations in the codon bases of a hypothetical code, the more efficient or optimal is that code. The inclusion of the fitness sharing technique in the evolutionary algorithm allows the extent to which the canonical genetic code is in an area corresponding to a deep local minimum to be easily determined, even in the high dimensional spaces considered. The analyses show that the canonical code is not in a deep local minimum and that the fitness landscape is not a multimodal fitness landscape with deep and separated peaks. Moreover, the canonical code is clearly far away from the areas of higher fitness in the landscape. Given the non-presence of deep local minima in the landscape, although the code could evolve and different forces could shape its structure, the fitness landscape nature considered in the error minimization theory does not explain why the canonical code ended its evolution in a location which is not an area of a localized deep minimum of the huge fitness landscape.
Multidisciplinary Modeling Software for Analysis, Design, and Optimization of HRRLS Vehicles
NASA Technical Reports Server (NTRS)
Spradley, Lawrence W.; Lohner, Rainald; Hunt, James L.
2011-01-01
The concept for Highly Reliable Reusable Launch Systems (HRRLS) under the NASA Hypersonics project is a two-stage-to-orbit, horizontal-take-off / horizontal-landing, (HTHL) architecture with an air-breathing first stage. The first stage vehicle is a slender body with an air-breathing propulsion system that is highly integrated with the airframe. The light weight slender body will deflect significantly during flight. This global deflection affects the flow over the vehicle and into the engine and thus the loads and moments on the vehicle. High-fidelity multi-disciplinary analyses that accounts for these fluid-structures-thermal interactions are required to accurately predict the vehicle loads and resultant response. These predictions of vehicle response to multi physics loads, calculated with fluid-structural-thermal interaction, are required in order to optimize the vehicle design over its full operating range. This contract with ResearchSouth addresses one of the primary objectives of the Vehicle Technology Integration (VTI) discipline: the development of high-fidelity multi-disciplinary analysis and optimization methods and tools for HRRLS vehicles. The primary goal of this effort is the development of an integrated software system that can be used for full-vehicle optimization. This goal was accomplished by: 1) integrating the master code, FEMAP, into the multidiscipline software network to direct the coupling to assure accurate fluid-structure-thermal interaction solutions; 2) loosely-coupling the Euler flow solver FEFLO to the available and proven aeroelasticity and large deformation (FEAP) code; 3) providing a coupled Euler-boundary layer capability for rapid viscous flow simulation; 4) developing and implementing improved Euler/RANS algorithms into the FEFLO CFD code to provide accurate shock capturing, skin friction, and heat-transfer predictions for HRRLS vehicles in hypersonic flow, 5) performing a Reynolds-averaged Navier-Stokes computation on an HRRLS configuration; 6) integrating the RANS solver with the FEAP code for coupled fluid-structure-thermal capability; and 7) integrating the existing NASA SRGULL propulsion flow path prediction software with the FEFLO software for quasi-3D propulsion flow path predictions, 8) improving and integrating into the network, an existing adjoint-based design optimization code.
NASA Astrophysics Data System (ADS)
He, Lirong; Cui, Guangmang; Feng, Huajun; Xu, Zhihai; Li, Qi; Chen, Yueting
2015-03-01
Coded exposure photography makes the motion de-blurring a well-posed problem. The integration pattern of light is modulated using the method of coded exposure by opening and closing the shutter within the exposure time, changing the traditional shutter frequency spectrum into a wider frequency band in order to preserve more image information in frequency domain. The searching method of optimal code is significant for coded exposure. In this paper, an improved criterion of the optimal code searching is proposed by analyzing relationship between code length and the number of ones in the code, considering the noise effect on code selection with the affine noise model. Then the optimal code is obtained utilizing the method of genetic searching algorithm based on the proposed selection criterion. Experimental results show that the time consuming of searching optimal code decreases with the presented method. The restoration image is obtained with better subjective experience and superior objective evaluation values.
Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Panyala, Ajay R.; Halappanavar, Mahantesh
Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) {more » on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.« less
Improved Speech Coding Based on Open-Loop Parameter Estimation
NASA Technical Reports Server (NTRS)
Juang, Jer-Nan; Chen, Ya-Chin; Longman, Richard W.
2000-01-01
A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best performance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.
Plessow, Philipp N
2018-02-13
This work explores how constrained linear combinations of bond lengths can be used to optimize transition states in periodic structures. Scanning of constrained coordinates is a standard approach for molecular codes with localized basis functions, where a full set of internal coordinates is used for optimization. Common plane wave-codes for periodic boundary conditions almost exlusively rely on Cartesian coordinates. An implementation of constrained linear combinations of bond lengths with Cartesian coordinates is described. Along with an optimization of the value of the constrained coordinate toward the transition states, this allows transition optimization within a single calculation. The approach is suitable for transition states that can be well described in terms of broken and formed bonds. In particular, the implementation is shown to be effective and efficient in the optimization of transition states in zeolite-catalyzed reactions, which have high relevance in industrial processes.
Targeting multiple heterogeneous hardware platforms with OpenCL
NASA Astrophysics Data System (ADS)
Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.
2014-06-01
The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
Ripesi, P; Biferale, L; Schifano, S F; Tripiccione, R
2014-04-01
We study the turbulent evolution originated from a system subjected to a Rayleigh-Taylor instability with a double density at high resolution in a two-dimensional geometry using a highly optimized thermal lattice-Boltzmann code for GPUs. Our investigation's initial condition, given by the superposition of three layers with three different densities, leads to the development of two Rayleigh-Taylor fronts that expand upward and downward and collide in the middle of the cell. By using high-resolution numerical data we highlight the effects induced by the collision of the two turbulent fronts in the long-time asymptotic regime. We also provide details on the optimized lattice-Boltzmann code that we have run on a cluster of GPUs.
Domain Wall Fermion Inverter on Pentium 4
NASA Astrophysics Data System (ADS)
Pochinsky, Andrew
2005-03-01
A highly optimized domain wall fermion inverter has been developed as part of the SciDAC lattice initiative. By designing the code to minimize memory bus traffic, it achieves high cache reuse and performance in excess of 2 GFlops for out of L2 cache problem sizes on a GigE cluster with 2.66 GHz Xeon processors. The code uses the SciDAC QMP communication library.
Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.
Ruymgaart, A Peter; Elber, Ron
2012-11-13
We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).
Optimal power allocation and joint source-channel coding for wireless DS-CDMA visual sensor networks
NASA Astrophysics Data System (ADS)
Pandremmenou, Katerina; Kondi, Lisimachos P.; Parsopoulos, Konstantinos E.
2011-01-01
In this paper, we propose a scheme for the optimal allocation of power, source coding rate, and channel coding rate for each of the nodes of a wireless Direct Sequence Code Division Multiple Access (DS-CDMA) visual sensor network. The optimization is quality-driven, i.e. the received quality of the video that is transmitted by the nodes is optimized. The scheme takes into account the fact that the sensor nodes may be imaging scenes with varying levels of motion. Nodes that image low-motion scenes will require a lower source coding rate, so they will be able to allocate a greater portion of the total available bit rate to channel coding. Stronger channel coding will mean that such nodes will be able to transmit at lower power. This will both increase battery life and reduce interference to other nodes. Two optimization criteria are considered. One that minimizes the average video distortion of the nodes and one that minimizes the maximum distortion among the nodes. The transmission powers are allowed to take continuous values, whereas the source and channel coding rates can assume only discrete values. Thus, the resulting optimization problem lies in the field of mixed-integer optimization tasks and is solved using Particle Swarm Optimization. Our experimental results show the importance of considering the characteristics of the video sequences when determining the transmission power, source coding rate and channel coding rate for the nodes of the visual sensor network.
Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin
2017-06-01
We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.
A novel neutron energy spectrum unfolding code using particle swarm optimization
NASA Astrophysics Data System (ADS)
Shahabinejad, H.; Sohrabpour, M.
2017-07-01
A novel neutron Spectrum Deconvolution using Particle Swarm Optimization (SDPSO) code has been developed to unfold the neutron spectrum from a pulse height distribution and a response matrix. The Particle Swarm Optimization (PSO) imitates the bird flocks social behavior to solve complex optimization problems. The results of the SDPSO code have been compared with those of the standard spectra and recently published Two-steps Genetic Algorithm Spectrum Unfolding (TGASU) code. The TGASU code have been previously compared with the other codes such as MAXED, GRAVEL, FERDOR and GAMCD and shown to be more accurate than the previous codes. The results of the SDPSO code have been demonstrated to match well with those of the TGASU code for both under determined and over-determined problems. In addition the SDPSO has been shown to be nearly two times faster than the TGASU code.
Observations Regarding Use of Advanced CFD Analysis, Sensitivity Analysis, and Design Codes in MDO
NASA Technical Reports Server (NTRS)
Newman, Perry A.; Hou, Gene J. W.; Taylor, Arthur C., III
1996-01-01
Observations regarding the use of advanced computational fluid dynamics (CFD) analysis, sensitivity analysis (SA), and design codes in gradient-based multidisciplinary design optimization (MDO) reflect our perception of the interactions required of CFD and our experience in recent aerodynamic design optimization studies using CFD. Sample results from these latter studies are summarized for conventional optimization (analysis - SA codes) and simultaneous analysis and design optimization (design code) using both Euler and Navier-Stokes flow approximations. The amount of computational resources required for aerodynamic design using CFD via analysis - SA codes is greater than that required for design codes. Thus, an MDO formulation that utilizes the more efficient design codes where possible is desired. However, in the aerovehicle MDO problem, the various disciplines that are involved have different design points in the flight envelope; therefore, CFD analysis - SA codes are required at the aerodynamic 'off design' points. The suggested MDO formulation is a hybrid multilevel optimization procedure that consists of both multipoint CFD analysis - SA codes and multipoint CFD design codes that perform suboptimizations.
Study of information transfer optimization for communication satellites
NASA Technical Reports Server (NTRS)
Odenwalder, J. P.; Viterbi, A. J.; Jacobs, I. M.; Heller, J. A.
1973-01-01
The results are presented of a study of source coding, modulation/channel coding, and systems techniques for application to teleconferencing over high data rate digital communication satellite links. Simultaneous transmission of video, voice, data, and/or graphics is possible in various teleconferencing modes and one-way, two-way, and broadcast modes are considered. A satellite channel model including filters, limiter, a TWT, detectors, and an optimized equalizer is treated in detail. A complete analysis is presented for one set of system assumptions which exclude nonlinear gain and phase distortion in the TWT. Modulation, demodulation, and channel coding are considered, based on an additive white Gaussian noise channel model which is an idealization of an equalized channel. Source coding with emphasis on video data compression is reviewed, and the experimental facility utilized to test promising techniques is fully described.
Performance comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo
NASA Astrophysics Data System (ADS)
Tran, Trac D.; Liu, Lijie; Topiwala, Pankaj
2007-09-01
This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and H.264/AVC High Profile 4:4:4 I-frame coding for high-resolution still images and high-definition (HD) 1080p video sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2]. Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode).
Joint Machine Learning and Game Theory for Rate Control in High Efficiency Video Coding.
Gao, Wei; Kwong, Sam; Jia, Yuheng
2017-08-25
In this paper, a joint machine learning and game theory modeling (MLGT) framework is proposed for inter frame coding tree unit (CTU) level bit allocation and rate control (RC) optimization in High Efficiency Video Coding (HEVC). First, a support vector machine (SVM) based multi-classification scheme is proposed to improve the prediction accuracy of CTU-level Rate-Distortion (R-D) model. The legacy "chicken-and-egg" dilemma in video coding is proposed to be overcome by the learning-based R-D model. Second, a mixed R-D model based cooperative bargaining game theory is proposed for bit allocation optimization, where the convexity of the mixed R-D model based utility function is proved, and Nash bargaining solution (NBS) is achieved by the proposed iterative solution search method. The minimum utility is adjusted by the reference coding distortion and frame-level Quantization parameter (QP) change. Lastly, intra frame QP and inter frame adaptive bit ratios are adjusted to make inter frames have more bit resources to maintain smooth quality and bit consumption in the bargaining game optimization. Experimental results demonstrate that the proposed MLGT based RC method can achieve much better R-D performances, quality smoothness, bit rate accuracy, buffer control results and subjective visual quality than the other state-of-the-art one-pass RC methods, and the achieved R-D performances are very close to the performance limits from the FixedQP method.
Resource allocation for error resilient video coding over AWGN using optimization approach.
An, Cheolhong; Nguyen, Truong Q
2008-12-01
The number of slices for error resilient video coding is jointly optimized with 802.11a-like media access control and the physical layers with automatic repeat request and rate compatible punctured convolutional code over additive white gaussian noise channel as well as channel times allocation for time division multiple access. For error resilient video coding, the relation between the number of slices and coding efficiency is analyzed and formulated as a mathematical model. It is applied for the joint optimization problem, and the problem is solved by a convex optimization method such as the primal-dual decomposition method. We compare the performance of a video communication system which uses the optimal number of slices with one that codes a picture as one slice. From numerical examples, end-to-end distortion of utility functions can be significantly reduced with the optimal slices of a picture especially at low signal-to-noise ratio.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective
NASA Technical Reports Server (NTRS)
Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)
2002-01-01
Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
Optimization technique of wavefront coding system based on ZEMAX externally compiled programs
NASA Astrophysics Data System (ADS)
Han, Libo; Dong, Liquan; Liu, Ming; Zhao, Yuejin; Liu, Xiaohua
2016-10-01
Wavefront coding technique as a means of athermalization applied to infrared imaging system, the design of phase plate is the key to system performance. This paper apply the externally compiled programs of ZEMAX to the optimization of phase mask in the normal optical design process, namely defining the evaluation function of wavefront coding system based on the consistency of modulation transfer function (MTF) and improving the speed of optimization by means of the introduction of the mathematical software. User write an external program which computes the evaluation function on account of the powerful computing feature of the mathematical software in order to find the optimal parameters of phase mask, and accelerate convergence through generic algorithm (GA), then use dynamic data exchange (DDE) interface between ZEMAX and mathematical software to realize high-speed data exchanging. The optimization of the rotational symmetric phase mask and the cubic phase mask have been completed by this method, the depth of focus increases nearly 3 times by inserting the rotational symmetric phase mask, while the other system with cubic phase mask can be increased to 10 times, the consistency of MTF decrease obviously, the maximum operating temperature of optimized system range between -40°-60°. Results show that this optimization method can be more convenient to define some unconventional optimization goals and fleetly to optimize optical system with special properties due to its externally compiled function and DDE, there will be greater significance for the optimization of unconventional optical system.
Size principle and information theory.
Senn, W; Wyler, K; Clamann, H P; Kleinle, J; Lüscher, H R; Müller, L
1997-01-01
The motor units of a skeletal muscle may be recruited according to different strategies. From all possible recruitment strategies nature selected the simplest one: in most actions of vertebrate skeletal muscles the recruitment of its motor units is by increasing size. This so-called size principle permits a high precision in muscle force generation since small muscle forces are produced exclusively by small motor units. Larger motor units are activated only if the total muscle force has already reached certain critical levels. We show that this recruitment by size is not only optimal in precision but also optimal in an information theoretical sense. We consider the motoneuron pool as an encoder generating a parallel binary code from a common input to that pool. The generated motoneuron code is sent down through the motoneuron axons to the muscle. We establish that an optimization of this motoneuron code with respect to its information content is equivalent to the recruitment of motor units by size. Moreover, maximal information content of the motoneuron code is equivalent to a minimal expected error in muscle force generation.
Reference View Selection in DIBR-Based Multiview Coding.
Maugey, Thomas; Petrazzuoli, Giovanni; Frossard, Pascal; Cagnazzo, Marco; Pesquet-Popescu, Beatrice
2016-04-01
Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate-distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.
Program optimizations: The interplay between power, performance, and energy
Leon, Edgar A.; Karlin, Ian; Grant, Ryan E.; ...
2016-05-16
Practical considerations for future supercomputer designs will impose limits on both instantaneous power consumption and total energy consumption. Working within these constraints while providing the maximum possible performance, application developers will need to optimize their code for speed alongside power and energy concerns. This paper analyzes the effectiveness of several code optimizations including loop fusion, data structure transformations, and global allocations. A per component measurement and analysis of different architectures is performed, enabling the examination of code optimizations on different compute subsystems. Using an explicit hydrodynamics proxy application from the U.S. Department of Energy, LULESH, we show how code optimizationsmore » impact different computational phases of the simulation. This provides insight for simulation developers into the best optimizations to use during particular simulation compute phases when optimizing code for future supercomputing platforms. Here, we examine and contrast both x86 and Blue Gene architectures with respect to these optimizations.« less
MHD Code Optimizations and Jets in Dense Gaseous Halos
NASA Astrophysics Data System (ADS)
Gaibler, Volker; Vigelius, Matthias; Krause, Martin; Camenzind, Max
We have further optimized and extended the 3D-MHD-code NIRVANA. The magnetized part runs in parallel, reaching 19 Gflops per SX-6 node, and has a passively advected particle population. In addition, the code is MPI-parallel now - on top of the shared memory parallelization. On a 512^3 grid, we reach 561 Gflops with 32 nodes on the SX-8. Also, we have successfully used FLASH on the Opteron cluster. Scientific results are preliminary so far. We report one computation of highly resolved cocoon turbulence. While we find some similarities to earlier 2D work by us and others, we note a strange reluctancy of cold material to enter the low density cocoon, which has to be investigated further.
Phenotypic Graphs and Evolution Unfold the Standard Genetic Code as the Optimal
NASA Astrophysics Data System (ADS)
Zamudio, Gabriel S.; José, Marco V.
2018-03-01
In this work, we explicitly consider the evolution of the Standard Genetic Code (SGC) by assuming two evolutionary stages, to wit, the primeval RNY code and two intermediate codes in between. We used network theory and graph theory to measure the connectivity of each phenotypic graph. The connectivity values are compared to the values of the codes under different randomization scenarios. An error-correcting optimal code is one in which the algebraic connectivity is minimized. We show that the SGC is optimal in regard to its robustness and error-tolerance when compared to all random codes under different assumptions.
NASA Astrophysics Data System (ADS)
Okamoto, Kazuhisa; Nonaka, Chiho
2017-06-01
We construct a new relativistic viscous hydrodynamics code optimized in the Milne coordinates. We split the conservation equations into an ideal part and a viscous part, using the Strang spitting method. In the code a Riemann solver based on the two-shock approximation is utilized for the ideal part and the Piecewise Exact Solution (PES) method is applied for the viscous part. We check the validity of our numerical calculations by comparing analytical solutions, the viscous Bjorken's flow and the Israel-Stewart theory in Gubser flow regime. Using the code, we discuss possible development of the Kelvin-Helmholtz instability in high-energy heavy-ion collisions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janjusic, Tommy; Kartsaklis, Christos
Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an applicationmore » s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).« less
A Fast Optimization Method for General Binary Code Learning.
Shen, Fumin; Zhou, Xiang; Yang, Yang; Song, Jingkuan; Shen, Heng; Tao, Dacheng
2016-09-22
Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely-used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this work, we propose a novel binary code optimization method, dubbed Discrete Proximal Linearized Minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this work by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised `2 loss encodes the whole NUS-WIDE database into 64-bit binary codes within 10 seconds on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale datasets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.
Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.
2014-11-23
This study describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis.
Toward Optimal Manifold Hashing via Discrete Locally Linear Embedding.
Rongrong Ji; Hong Liu; Liujuan Cao; Di Liu; Yongjian Wu; Feiyue Huang
2017-11-01
Binary code learning, also known as hashing, has received increasing attention in large-scale visual search. By transforming high-dimensional features to binary codes, the original Euclidean distance is approximated via Hamming distance. More recently, it is advocated that it is the manifold distance, rather than the Euclidean distance, that should be preserved in the Hamming space. However, it retains as an open problem to directly preserve the manifold structure by hashing. In particular, it first needs to build the local linear embedding in the original feature space, and then quantize such embedding to binary codes. Such a two-step coding is problematic and less optimized. Besides, the off-line learning is extremely time and memory consuming, which needs to calculate the similarity matrix of the original data. In this paper, we propose a novel hashing algorithm, termed discrete locality linear embedding hashing (DLLH), which well addresses the above challenges. The DLLH directly reconstructs the manifold structure in the Hamming space, which learns optimal hash codes to maintain the local linear relationship of data points. To learn discrete locally linear embeddingcodes, we further propose a discrete optimization algorithm with an iterative parameters updating scheme. Moreover, an anchor-based acceleration scheme, termed Anchor-DLLH, is further introduced, which approximates the large similarity matrix by the product of two low-rank matrices. Experimental results on three widely used benchmark data sets, i.e., CIFAR10, NUS-WIDE, and YouTube Face, have shown superior performance of the proposed DLLH over the state-of-the-art approaches.
Channel modeling, signal processing and coding for perpendicular magnetic recording
NASA Astrophysics Data System (ADS)
Wu, Zheng
With the increasing areal density in magnetic recording systems, perpendicular recording has replaced longitudinal recording to overcome the superparamagnetic limit. Studies on perpendicular recording channels including aspects of channel modeling, signal processing and coding techniques are presented in this dissertation. To optimize a high density perpendicular magnetic recording system, one needs to know the tradeoffs between various components of the system including the read/write transducers, the magnetic medium, and the read channel. We extend the work by Chaichanavong on the parameter optimization for systems via design curves. Different signal processing and coding techniques are studied. Information-theoretic tools are utilized to determine the acceptable region for the channel parameters when optimal detection and linear coding techniques are used. Our results show that a considerable gain can be achieved by the optimal detection and coding techniques. The read-write process in perpendicular magnetic recording channels includes a number of nonlinear effects. Nonlinear transition shift (NLTS) is one of them. The signal distortion induced by NLTS can be reduced by write precompensation during data recording. We numerically evaluate the effect of NLTS on the read-back signal and examine the effectiveness of several write precompensation schemes in combating NLTS in a channel characterized by both transition jitter noise and additive white Gaussian electronics noise. We also present an analytical method to estimate the bit-error-rate and use it to help determine the optimal write precompensation values in multi-level precompensation schemes. We propose a mean-adjusted pattern-dependent noise predictive (PDNP) detection algorithm for use on the channel with NLTS. We show that this detector can offer significant improvements in bit-error-rate (BER) compared to conventional Viterbi and PDNP detectors. Moreover, the system performance can be further improved by combining the new detector with a simple write precompensation scheme. Soft-decision decoding for algebraic codes can improve performance for magnetic recording systems. In this dissertation, we propose two soft-decision decoding methods for tensor-product parity codes. We also present a list decoding algorithm for generalized error locating codes.
Novel AroA from Pseudomonas putida Confers Tobacco Plant with High Tolerance to Glyphosate
Yan, Hai-Qin; Chang, Su-Hua; Tian, Zhe-Xian; Zhang, Le; Sun, Yi-Cheng; Li, Yan; Wang, Jing; Wang, Yi-Ping
2011-01-01
Glyphosate is a non-selective broad-spectrum herbicide that inhibits 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS, also designated as AroA), a key enzyme in the aromatic amino acid biosynthesis pathway in microorganisms and plants. Previously, we reported that a novel AroA (PpAroA1) from Pseudomonas putida had high tolerance to glyphosate, with little homology to class I or class II glyphosate-tolerant AroA. In this study, the coding sequence of PpAroA1 was optimized for tobacco. For maturation of the enzyme in chloroplast, a chloroplast transit peptide coding sequence was fused in frame with the optimized aroA gene (PparoA1optimized) at the 5′ end. The PparoA1optimized gene was introduced into the tobacco (Nicotiana tabacum L. cv. W38) genome via Agrobacterium-mediated transformation. The transformed explants were first screened in shoot induction medium containing kanamycin. Then glyphosate tolerance was assayed in putative transgenic plants and its T1 progeny. Our results show that the PpAroA1 from Pseudomonas putida can efficiently confer tobacco plants with high glyphosate tolerance. Transgenic tobacco overexpressing the PparoA1optimized gene exhibit high tolerance to glyphosate, which suggest that the novel PpAroA1 is a new and good candidate applied in transgenic crops with glyphosate tolerance in future. PMID:21611121
Optimizations of a Hardware Decoder for Deep-Space Optical Communications
NASA Technical Reports Server (NTRS)
Cheng, Michael K.; Nakashima, Michael A.; Moision, Bruce E.; Hamkins, Jon
2007-01-01
The National Aeronautics and Space Administration has developed a capacity approaching modulation and coding scheme that comprises a serial concatenation of an inner accumulate pulse-position modulation (PPM) and an outer convolutional code [or serially concatenated PPM (SCPPM)] for deep-space optical communications. Decoding of this code uses the turbo principle. However, due to the nonbinary property of SCPPM, a straightforward application of classical turbo decoding is very inefficient. Here, we present various optimizations applicable in hardware implementation of the SCPPM decoder. More specifically, we feature a Super Gamma computation to efficiently handle parallel trellis edges, a pipeline-friendly 'maxstar top-2' circuit that reduces the max-only approximation penalty, a low-latency cyclic redundancy check circuit for window-based decoders, and a high-speed algorithmic polynomial interleaver that leads to memory savings. Using the featured optimizations, we implement a 6.72 megabits-per-second (Mbps) SCPPM decoder on a single field-programmable gate array (FPGA). Compared to the current data rate of 256 kilobits per second from Mars, the SCPPM coded scheme represents a throughput increase of more than twenty-six fold. Extension to a 50-Mbps decoder on a board with multiple FPGAs follows naturally. We show through hardware simulations that the SCPPM coded system can operate within 1 dB of the Shannon capacity at nominal operating conditions.
Liu, Tao; Djordjevic, Ivan B
2014-12-29
In this paper, we first describe an optimal signal constellation design algorithm suitable for the coherent optical channels dominated by the linear phase noise. Then, we modify this algorithm to be suitable for the nonlinear phase noise dominated channels. In optimization procedure, the proposed algorithm uses the cumulative log-likelihood function instead of the Euclidian distance. Further, an LDPC coded modulation scheme is proposed to be used in combination with signal constellations obtained by proposed algorithm. Monte Carlo simulations indicate that the LDPC-coded modulation schemes employing the new constellation sets, obtained by our new signal constellation design algorithm, outperform corresponding QAM constellations significantly in terms of transmission distance and have better nonlinearity tolerance.
Final Report A Multi-Language Environment For Programmable Code Optimization and Empirical Tuning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yi, Qing; Whaley, Richard Clint; Qasem, Apan
This report summarizes our effort and results of building an integrated optimization environment to effectively combine the programmable control and the empirical tuning of source-to-source compiler optimizations within the framework of multiple existing languages, specifically C, C++, and Fortran. The environment contains two main components: the ROSE analysis engine, which is based on the ROSE C/C++/Fortran2003 source-to-source compiler developed by Co-PI Dr.Quinlan et. al at DOE/LLNL, and the POET transformation engine, which is based on an interpreted program transformation language developed by Dr. Yi at University of Texas at San Antonio (UTSA). The ROSE analysis engine performs advanced compiler analysis,more » identifies profitable code transformations, and then produces output in POET, a language designed to provide programmable control of compiler optimizations to application developers and to support the parameterization of architecture-sensitive optimizations so that their configurations can be empirically tuned later. This POET output can then be ported to different machines together with the user application, where a POET-based search engine empirically reconfigures the parameterized optimizations until satisfactory performance is found. Computational specialists can write POET scripts to directly control the optimization of their code. Application developers can interact with ROSE to obtain optimization feedback as well as provide domain-specific knowledge and high-level optimization strategies. The optimization environment is expected to support different levels of automation and programmer intervention, from fully-automated tuning to semi-automated development and to manual programmable control.« less
An optimization program based on the method of feasible directions: Theory and users guide
NASA Technical Reports Server (NTRS)
Belegundu, Ashok D.; Berke, Laszlo; Patnaik, Surya N.
1994-01-01
The theory and user instructions for an optimization code based on the method of feasible directions are presented. The code was written for wide distribution and ease of attachment to other simulation software. Although the theory of the method of feasible direction was developed in the 1960's, many considerations are involved in its actual implementation as a computer code. Included in the code are a number of features to improve robustness in optimization. The search direction is obtained by solving a quadratic program using an interior method based on Karmarkar's algorithm. The theory is discussed focusing on the important and often overlooked role played by the various parameters guiding the iterations within the program. Also discussed is a robust approach for handling infeasible starting points. The code was validated by solving a variety of structural optimization test problems that have known solutions obtained by other optimization codes. It has been observed that this code is robust: it has solved a variety of problems from different starting points. However, the code is inefficient in that it takes considerable CPU time as compared with certain other available codes. Further work is required to improve its efficiency while retaining its robustness.
The effect of code expanding optimizations on instruction cache design
NASA Technical Reports Server (NTRS)
Chen, William Y.; Chang, Pohua P.; Conte, Thomas M.; Hwu, Wen-Mei W.
1991-01-01
It is shown that code expanding optimizations have strong and non-intuitive implications on instruction cache design. Three types of code expanding optimizations are studied: instruction placement, function inline expansion, and superscalar optimizations. Overall, instruction placement reduces the miss ratio of small caches. Function inline expansion improves the performance for small cache sizes, but degrades the performance of medium caches. Superscalar optimizations increases the cache size required for a given miss ratio. On the other hand, they also increase the sequentiality of instruction access so that a simple load-forward scheme effectively cancels the negative effects. Overall, it is shown that with load forwarding, the three types of code expanding optimizations jointly improve the performance of small caches and have little effect on large caches.
FY17 Status Report on NEAMS Neutronics Activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, C. H.; Jung, Y. S.; Smith, M. A.
2017-09-30
Under the U.S. DOE NEAMS program, the high-fidelity neutronics code system has been developed to support the multiphysics modeling and simulation capability named SHARP. The neutronics code system includes the high-fidelity neutronics code PROTEUS, the cross section library and preprocessing tools, the multigroup cross section generation code MC2-3, the in-house meshing generation tool, the perturbation and sensitivity analysis code PERSENT, and post-processing tools. The main objectives of the NEAMS neutronics activities in FY17 are to continue development of an advanced nodal solver in PROTEUS for use in nuclear reactor design and analysis projects, implement a simplified sub-channel based thermal-hydraulic (T/H)more » capability into PROTEUS to efficiently compute the thermal feedback, improve the performance of PROTEUS-MOCEX using numerical acceleration and code optimization, improve the cross section generation tools including MC2-3, and continue to perform verification and validation tests for PROTEUS.« less
Are V1 Simple Cells Optimized for Visual Occlusions? A Comparative Study
Bornschein, Jörg; Henniges, Marc; Lücke, Jörg
2013-01-01
Simple cells in primary visual cortex were famously found to respond to low-level image components such as edges. Sparse coding and independent component analysis (ICA) emerged as the standard computational models for simple cell coding because they linked their receptive fields to the statistics of visual stimuli. However, a salient feature of image statistics, occlusions of image components, is not considered by these models. Here we ask if occlusions have an effect on the predicted shapes of simple cell receptive fields. We use a comparative approach to answer this question and investigate two models for simple cells: a standard linear model and an occlusive model. For both models we simultaneously estimate optimal receptive fields, sparsity and stimulus noise. The two models are identical except for their component superposition assumption. We find the image encoding and receptive fields predicted by the models to differ significantly. While both models predict many Gabor-like fields, the occlusive model predicts a much sparser encoding and high percentages of ‘globular’ receptive fields. This relatively new center-surround type of simple cell response is observed since reverse correlation is used in experimental studies. While high percentages of ‘globular’ fields can be obtained using specific choices of sparsity and overcompleteness in linear sparse coding, no or only low proportions are reported in the vast majority of studies on linear models (including all ICA models). Likewise, for the here investigated linear model and optimal sparsity, only low proportions of ‘globular’ fields are observed. In comparison, the occlusive model robustly infers high proportions and can match the experimentally observed high proportions of ‘globular’ fields well. Our computational study, therefore, suggests that ‘globular’ fields may be evidence for an optimal encoding of visual occlusions in primary visual cortex. PMID:23754938
High Order Entropy-Constrained Residual VQ for Lossless Compression of Images
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Smith, Mark J. T.; Scales, Allen
1995-01-01
High order entropy coding is a powerful technique for exploiting high order statistical dependencies. However, the exponentially high complexity associated with such a method often discourages its use. In this paper, an entropy-constrained residual vector quantization method is proposed for lossless compression of images. The method consists of first quantizing the input image using a high order entropy-constrained residual vector quantizer and then coding the residual image using a first order entropy coder. The distortion measure used in the entropy-constrained optimization is essentially the first order entropy of the residual image. Experimental results show very competitive performance.
HERCULES: A Pattern Driven Code Transformation System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kartsaklis, Christos; Hernandez, Oscar R; Hsu, Chung-Hsing
2012-01-01
New parallel computers are emerging, but developing efficient scientific code for them remains difficult. A scientist must manage not only the science-domain complexity but also the performance-optimization complexity. HERCULES is a code transformation system designed to help the scientist to separate the two concerns, which improves code maintenance, and facilitates performance optimization. The system combines three technologies, code patterns, transformation scripts and compiler plugins, to provide the scientist with an environment to quickly implement code transformations that suit his needs. Unlike existing code optimization tools, HERCULES is unique in its focus on user-level accessibility. In this paper we discuss themore » design, implementation and an initial evaluation of HERCULES.« less
Han, Dahai; Gu, Yanjie; Zhang, Min
2017-08-10
An optimized scheme of pulse symmetrical position-orthogonal space-time block codes (PSP-OSTBC) is proposed and applied with m-pulse positions modulation (m-PPM) without the use of a complex decoding algorithm in an optical multi-input multi-output (MIMO) ultraviolet (UV) communication system. The proposed scheme breaks through the limitation of the traditional Alamouti code and is suitable for high-order m-PPM in a UV scattering channel, verified by both simulation experiments and field tests with specific parameters. The performances of 1×1, 2×1, and 2×2 PSP-OSTBC systems with 4-PPM are compared experimentally as the optimal tradeoff between modification and coding in practical application. Meanwhile, the feasibility of the proposed scheme for 8-PPM is examined by a simulation experiment as well. The results suggest that the proposed scheme makes the system insensitive to the influence of path loss with a larger channel capacity, and a higher diversity gain and coding gain with a simple decoding algorithm will be achieved by employing the orthogonality of m-PPM in an optical-MIMO-based ultraviolet scattering channel.
Okamoto, Kazuhisa; Nonaka, Chiho
2017-06-09
Here, we construct a new relativistic viscous hydrodynamics code optimized in the Milne coordinates. We also split the conservation equations into an ideal part and a viscous part, using the Strang spitting method. In the code a Riemann solver based on the two-shock approximation is utilized for the ideal part and the Piecewise Exact Solution (PES) method is applied for the viscous part. Furthemore, we check the validity of our numerical calculations by comparing analytical solutions, the viscous Bjorken’s flow and the Israel–Stewart theory in Gubser flow regime. Using the code, we discuss possible development of the Kelvin–Helmholtz instability inmore » high-energy heavy-ion collisions.« less
Fundamental Limits of Delay and Security in Device-to-Device Communication
2013-01-01
systematic MDS (maximum distance separable) codes and random binning strategies that achieve a Pareto optimal delayreconstruction tradeoff. The erasure MD...file, and a coding scheme based on erasure compression and Slepian-Wolf binning is presented. The coding scheme is shown to provide a Pareto optimal...ble) codes and random binning strategies that achieve a Pareto optimal delay- reconstruction tradeoff. The erasure MD setup is then used to propose a
A new software for deformation source optimization, the Bayesian Earthquake Analysis Tool (BEAT)
NASA Astrophysics Data System (ADS)
Vasyura-Bathke, H.; Dutta, R.; Jonsson, S.; Mai, P. M.
2017-12-01
Modern studies of crustal deformation and the related source estimation, including magmatic and tectonic sources, increasingly use non-linear optimization strategies to estimate geometric and/or kinematic source parameters and often consider both jointly, geodetic and seismic data. Bayesian inference is increasingly being used for estimating posterior distributions of deformation source model parameters, given measured/estimated/assumed data and model uncertainties. For instance, some studies consider uncertainties of a layered medium and propagate these into source parameter uncertainties, while others use informative priors to reduce the model parameter space. In addition, innovative sampling algorithms have been developed to efficiently explore the high-dimensional parameter spaces. Compared to earlier studies, these improvements have resulted in overall more robust source model parameter estimates that include uncertainties. However, the computational burden of these methods is high and estimation codes are rarely made available along with the published results. Even if the codes are accessible, it is usually challenging to assemble them into a single optimization framework as they are typically coded in different programing languages. Therefore, further progress and future applications of these methods/codes are hampered, while reproducibility and validation of results has become essentially impossible. In the spirit of providing open-access and modular codes to facilitate progress and reproducible research in deformation source estimations, we undertook the effort of developing BEAT, a python package that comprises all the above-mentioned features in one single programing environment. The package builds on the pyrocko seismological toolbox (www.pyrocko.org), and uses the pymc3 module for Bayesian statistical model fitting. BEAT is an open-source package (https://github.com/hvasbath/beat), and we encourage and solicit contributions to the project. Here, we present our strategy for developing BEAT and show application examples; especially the effect of including the model prediction uncertainty of the velocity model in following source optimizations: full moment tensor, Mogi source, moderate strike-slip earth-quake.
Rate-compatible punctured convolutional codes (RCPC codes) and their applications
NASA Astrophysics Data System (ADS)
Hagenauer, Joachim
1988-04-01
The concept of punctured convolutional codes is extended by punctuating a low-rate 1/N code periodically with period P to obtain a family of codes with rate P/(P + l), where l can be varied between 1 and (N - 1)P. A rate-compatibility restriction on the puncturing tables ensures that all code bits of high rate codes are used by the lower-rate codes. This allows transmission of incremental redundancy in ARQ/FEC (automatic repeat request/forward error correction) schemes and continuous rate variation to change from low to high error protection within a data frame. Families of RCPC codes with rates between 8/9 and 1/4 are given for memories M from 3 to 6 (8 to 64 trellis states) together with the relevant distance spectra. These codes are almost as good as the best known general convolutional codes of the respective rates. It is shown that the same Viterbi decoder can be used for all RCPC codes of the same M. The application of RCPC codes to hybrid ARQ/FEC schemes is discussed for Gaussian and Rayleigh fading channels using channel-state information to optimize throughput.
The application of nonlinear programming and collocation to optimal aeroassisted orbital transfers
NASA Astrophysics Data System (ADS)
Shi, Y. Y.; Nelson, R. L.; Young, D. H.; Gill, P. E.; Murray, W.; Saunders, M. A.
1992-01-01
Sequential quadratic programming (SQP) and collocation of the differential equations of motion were applied to optimal aeroassisted orbital transfers. The Optimal Trajectory by Implicit Simulation (OTIS) computer program codes with updated nonlinear programming code (NZSOL) were used as a testbed for the SQP nonlinear programming (NLP) algorithms. The state-of-the-art sparse SQP method is considered to be effective for solving large problems with a sparse matrix. Sparse optimizers are characterized in terms of memory requirements and computational efficiency. For the OTIS problems, less than 10 percent of the Jacobian matrix elements are nonzero. The SQP method encompasses two phases: finding an initial feasible point by minimizing the sum of infeasibilities and minimizing the quadratic objective function within the feasible region. The orbital transfer problem under consideration involves the transfer from a high energy orbit to a low energy orbit.
A Degree Distribution Optimization Algorithm for Image Transmission
NASA Astrophysics Data System (ADS)
Jiang, Wei; Yang, Junjie
2016-09-01
Luby Transform (LT) code is the first practical implementation of digital fountain code. The coding behavior of LT code is mainly decided by the degree distribution which determines the relationship between source data and codewords. Two degree distributions are suggested by Luby. They work well in typical situations but not optimally in case of finite encoding symbols. In this work, the degree distribution optimization algorithm is proposed to explore the potential of LT code. Firstly selection scheme of sparse degrees for LT codes is introduced. Then probability distribution is optimized according to the selected degrees. In image transmission, bit stream is sensitive to the channel noise and even a single bit error may cause the loss of synchronization between the encoder and the decoder. Therefore the proposed algorithm is designed for image transmission situation. Moreover, optimal class partition is studied for image transmission with unequal error protection. The experimental results are quite promising. Compared with LT code with robust soliton distribution, the proposed algorithm improves the final quality of recovered images obviously with the same overhead.
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks
DOE Office of Scientific and Technical Information (OSTI.GOV)
French, S; Nazareth, D; Bellor, M
Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less
A novel concatenated code based on the improved SCG-LDPC code for optical transmission systems
NASA Astrophysics Data System (ADS)
Yuan, Jian-guo; Xie, Ya; Wang, Lin; Huang, Sheng; Wang, Yong
2013-01-01
Based on the optimization and improvement for the construction method of systematically constructed Gallager (SCG) (4, k) code, a novel SCG low density parity check (SCG-LDPC)(3969, 3720) code to be suitable for optical transmission systems is constructed. The novel SCG-LDPC (6561,6240) code with code rate of 95.1% is constructed by increasing the length of SCG-LDPC (3969,3720) code, and in a way, the code rate of LDPC codes can better meet the high requirements of optical transmission systems. And then the novel concatenated code is constructed by concatenating SCG-LDPC(6561,6240) code and BCH(127,120) code with code rate of 94.5%. The simulation results and analyses show that the net coding gain (NCG) of BCH(127,120)+SCG-LDPC(6561,6240) concatenated code is respectively 2.28 dB and 0.48 dB more than those of the classic RS(255,239) code and SCG-LDPC(6561,6240) code at the bit error rate (BER) of 10-7.
Heat-transfer optimization of a high-spin thermal battery
NASA Astrophysics Data System (ADS)
Krieger, Frank C.
Recent advancements in thermal battery technology have produced batteries incorporating a fusible material heat reservoir for operating temperature control that operate reliably under the high spin rates often encountered in ordnance applications. Attention is presently given to the heat-transfer optimization of a high-spin thermal battery employing a nonfusible steel heat reservoir, on the basis of a computer code that simulated the effect of an actual fusible material heat reservoir on battery performance. Both heat paper and heat pellet employing thermal battery configurations were considered.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.
2015-05-01
Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
A survey of compiler optimization techniques
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1972-01-01
Major optimization techniques of compilers are described and grouped into three categories: machine dependent, architecture dependent, and architecture independent. Machine-dependent optimizations tend to be local and are performed upon short spans of generated code by using particular properties of an instruction set to reduce the time or space required by a program. Architecture-dependent optimizations are global and are performed while generating code. These optimizations consider the structure of a computer, but not its detailed instruction set. Architecture independent optimizations are also global but are based on analysis of the program flow graph and the dependencies among statements of source program. A conceptual review of a universal optimizer that performs architecture-independent optimizations at source-code level is also presented.
On the reduced-complexity of LDPC decoders for ultra-high-speed optical transmission.
Djordjevic, Ivan B; Xu, Lei; Wang, Ting
2010-10-25
We propose two reduced-complexity (RC) LDPC decoders, which can be used in combination with large-girth LDPC codes to enable ultra-high-speed serial optical transmission. We show that optimally attenuated RC min-sum sum algorithm performs only 0.46 dB (at BER of 10(-9)) worse than conventional sum-product algorithm, while having lower storage memory requirements and much lower latency. We further study the use of RC LDPC decoding algorithms in multilevel coded modulation with coherent detection and show that with RC decoding algorithms we can achieve the net coding gain larger than 11 dB at BERs below 10(-9).
Multi-Constraint Multi-Variable Optimization of Source-Driven Nuclear Systems
NASA Astrophysics Data System (ADS)
Watkins, Edward Francis
1995-01-01
A novel approach to the search for optimal designs of source-driven nuclear systems is investigated. Such systems include radiation shields, fusion reactor blankets and various neutron spectrum-shaping assemblies. The novel approach involves the replacement of the steepest-descents optimization algorithm incorporated in the code SWAN by a significantly more general and efficient sequential quadratic programming optimization algorithm provided by the code NPSOL. The resulting SWAN/NPSOL code system can be applied to more general, multi-variable, multi-constraint shield optimization problems. The constraints it accounts for may include simple bounds on variables, linear constraints, and smooth nonlinear constraints. It may also be applied to unconstrained, bound-constrained and linearly constrained optimization. The shield optimization capabilities of the SWAN/NPSOL code system is tested and verified in a variety of optimization problems: dose minimization at constant cost, cost minimization at constant dose, and multiple-nonlinear constraint optimization. The replacement of the optimization part of SWAN with NPSOL is found feasible and leads to a very substantial improvement in the complexity of optimization problems which can be efficiently handled.
FPGA Implementation of Optimal 3D-Integer DCT Structure for Video Compression
2015-01-01
A novel optimal structure for implementing 3D-integer discrete cosine transform (DCT) is presented by analyzing various integer approximation methods. The integer set with reduced mean squared error (MSE) and high coding efficiency are considered for implementation in FPGA. The proposed method proves that the least resources are utilized for the integer set that has shorter bit values. Optimal 3D-integer DCT structure is determined by analyzing the MSE, power dissipation, coding efficiency, and hardware complexity of different integer sets. The experimental results reveal that direct method of computing the 3D-integer DCT using the integer set [10, 9, 6, 2, 3, 1, 1] performs better when compared to other integer sets in terms of resource utilization and power dissipation. PMID:26601120
NASA Astrophysics Data System (ADS)
Li, Kai; Deng, Haixiao
2018-07-01
The Shanghai Coherent Light Facility (SCLF) is a quasi-continuous wave hard X-ray free electron laser facility, which is currently under construction. Due to the high repetition rate and high-quality electron beams, it is straightforward to consider X-ray free electron laser oscillator (XFELO) operation for the SCLF. In this paper, the main processes for XFELO design, and parameter optimization of the undulator, X-ray cavity, and electron beam are described. A three-dimensional X-ray crystal Bragg diffraction code, named BRIGHT, was introduced for the first time, which can be combined with the GENESIS and OPC codes for the numerical simulations of the XFELO. The performance of the XFELO of the SCLF is investigated and optimized by theoretical analysis and numerical simulation.
Optimized nonorthogonal transforms for image compression.
Guleryuz, O G; Orchard, M T
1997-01-01
The transform coding of images is analyzed from a common standpoint in order to generate a framework for the design of optimal transforms. It is argued that all transform coders are alike in the way they manipulate the data structure formed by transform coefficients. A general energy compaction measure is proposed to generate optimized transforms with desirable characteristics particularly suited to the simple transform coding operation of scalar quantization and entropy coding. It is shown that the optimal linear decoder (inverse transform) must be an optimal linear estimator, independent of the structure of the transform generating the coefficients. A formulation that sequentially optimizes the transforms is presented, and design equations and algorithms for its computation provided. The properties of the resulting transform systems are investigated. In particular, it is shown that the resulting basis are nonorthogonal and complete, producing energy compaction optimized, decorrelated transform coefficients. Quantization issues related to nonorthogonal expansion coefficients are addressed with a simple, efficient algorithm. Two implementations are discussed, and image coding examples are given. It is shown that the proposed design framework results in systems with superior energy compaction properties and excellent coding results.
High-density digital recording
NASA Technical Reports Server (NTRS)
Kalil, F. (Editor); Buschman, A. (Editor)
1985-01-01
The problems associated with high-density digital recording (HDDR) are discussed. Five independent users of HDDR systems and their problems, solutions, and insights are provided as guidance for other users of HDDR systems. Various pulse code modulation coding techniques are reviewed. An introduction to error detection and correction head optimization theory and perpendicular recording are provided. Competitive tape recorder manufacturers apply all of the above theories and techniques and present their offerings. The methodology used by the HDDR Users Subcommittee of THIC to evaluate parallel HDDR systems is presented.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Performance optimization of Qbox and WEST on Intel Knights Landing
NASA Astrophysics Data System (ADS)
Zheng, Huihuo; Knight, Christopher; Galli, Giulia; Govoni, Marco; Gygi, Francois
We present the optimization of electronic structure codes Qbox and WEST targeting the Intel®Xeon Phi™processor, codenamed Knights Landing (KNL). Qbox is an ab-initio molecular dynamics code based on plane wave density functional theory (DFT) and WEST is a post-DFT code for excited state calculations within many-body perturbation theory. Both Qbox and WEST employ highly scalable algorithms which enable accurate large-scale electronic structure calculations on leadership class supercomputer platforms beyond 100,000 cores, such as Mira and Theta at the Argonne Leadership Computing Facility. In this work, features of the KNL architecture (e.g. hierarchical memory) are explored to achieve higher performance in key algorithms of the Qbox and WEST codes and to develop a road-map for further development targeting next-generation computing architectures. In particular, the optimizations of the Qbox and WEST codes on the KNL platform will target efficient large-scale electronic structure calculations of nanostructured materials exhibiting complex structures and prediction of their electronic and thermal properties for use in solar and thermal energy conversion device. This work was supported by MICCoM, as part of Comp. Mats. Sci. Program funded by the U.S. DOE, Office of Sci., BES, MSE Division. This research used resources of the ALCF, which is a DOE Office of Sci. User Facility under Contract DE-AC02-06CH11357.
Robust information propagation through noisy neural circuits
Pouget, Alexandre
2017-01-01
Sensory neurons give highly variable responses to stimulation, which can limit the amount of stimulus information available to downstream circuits. Much work has investigated the factors that affect the amount of information encoded in these population responses, leading to insights about the role of covariability among neurons, tuning curve shape, etc. However, the informativeness of neural responses is not the only relevant feature of population codes; of potentially equal importance is how robustly that information propagates to downstream structures. For instance, to quantify the retina’s performance, one must consider not only the informativeness of the optic nerve responses, but also the amount of information that survives the spike-generating nonlinearity and noise corruption in the next stage of processing, the lateral geniculate nucleus. Our study identifies the set of covariance structures for the upstream cells that optimize the ability of information to propagate through noisy, nonlinear circuits. Within this optimal family are covariances with “differential correlations”, which are known to reduce the information encoded in neural population activities. Thus, covariance structures that maximize information in neural population codes, and those that maximize the ability of this information to propagate, can be very different. Moreover, redundancy is neither necessary nor sufficient to make population codes robust against corruption by noise: redundant codes can be very fragile, and synergistic codes can—in some cases—optimize robustness against noise. PMID:28419098
Optimization of beam shaping assembly based on D-T neutron generator and dose evaluation for BNCT
NASA Astrophysics Data System (ADS)
Naeem, Hamza; Chen, Chaobin; Zheng, Huaqing; Song, Jing
2017-04-01
The feasibility of developing an epithermal neutron beam for a boron neutron capture therapy (BNCT) facility based on a high intensity D-T fusion neutron generator (HINEG) and using the Monte Carlo code SuperMC (Super Monte Carlo simulation program for nuclear and radiation process) is proposed in this study. The Monte Carlo code SuperMC is used to determine and optimize the final configuration of the beam shaping assembly (BSA). The optimal BSA design in a cylindrical geometry which consists of a natural uranium sphere (14 cm) as a neutron multiplier, AlF3 and TiF3 as moderators (20 cm each), Cd (1 mm) as a thermal neutron filter, Bi (5 cm) as a gamma shield, and Pb as a reflector and collimator to guide neutrons towards the exit window. The epithermal neutron beam flux of the proposed model is 5.73 × 109 n/cm2s, and other dosimetric parameters for the BNCT reported by IAEA-TECDOC-1223 have been verified. The phantom dose analysis shows that the designed BSA is accurate, efficient and suitable for BNCT applications. Thus, the Monte Carlo code SuperMC is concluded to be capable of simulating the BSA and the dose calculation for BNCT, and high epithermal flux can be achieved using proposed BSA.
Codon Optimizing for Increased Membrane Protein Production: A Minimalist Approach.
Mirzadeh, Kiavash; Toddo, Stephen; Nørholm, Morten H H; Daley, Daniel O
2016-01-01
Reengineering a gene with synonymous codons is a popular approach for increasing production levels of recombinant proteins. Here we present a minimalist alternative to this method, which samples synonymous codons only at the second and third positions rather than the entire coding sequence. As demonstrated with two membrane-embedded transporters in Escherichia coli, the method was more effective than optimizing the entire coding sequence. The method we present is PCR based and requires three simple steps: (1) the design of two PCR primers, one of which is degenerate; (2) the amplification of a mini-library by PCR; and (3) screening for high-expressing clones.
Low-density parity-check codes for volume holographic memory systems.
Pishro-Nik, Hossein; Rahnavard, Nazanin; Ha, Jeongseok; Fekri, Faramarz; Adibi, Ali
2003-02-10
We investigate the application of low-density parity-check (LDPC) codes in volume holographic memory (VHM) systems. We show that a carefully designed irregular LDPC code has a very good performance in VHM systems. We optimize high-rate LDPC codes for the nonuniform error pattern in holographic memories to reduce the bit error rate extensively. The prior knowledge of noise distribution is used for designing as well as decoding the LDPC codes. We show that these codes have a superior performance to that of Reed-Solomon (RS) codes and regular LDPC counterparts. Our simulation shows that we can increase the maximum storage capacity of holographic memories by more than 50 percent if we use irregular LDPC codes with soft-decision decoding instead of conventionally employed RS codes with hard-decision decoding. The performance of these LDPC codes is close to the information theoretic capacity.
High Performance Fortran for Aerospace Applications
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Zima, Hans; Bushnell, Dennis M. (Technical Monitor)
2000-01-01
This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications. HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications, while delegating to the compiler/runtime system the task of generating explicitly parallel message-passing programs. We begin by providing a short overview of the HPF language. This is followed by a detailed discussion of the efficient use of HPF for applications involving multiple structured grids such as multiblock and adaptive mesh refinement (AMR) codes as well as unstructured grid codes. We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.
NASA Technical Reports Server (NTRS)
Huck, Friedrich O.; Fales, Carl L.
1990-01-01
Researchers are concerned with the end-to-end performance of image gathering, coding, and processing. The applications range from high-resolution television to vision-based robotics, wherever the resolution, efficiency and robustness of visual information acquisition and processing are critical. For the presentation at this workshop, it is convenient to divide research activities into the following two overlapping areas: The first is the development of focal-plane processing techniques and technology to effectively combine image gathering with coding, with an emphasis on low-level vision processing akin to the retinal processing in human vision. The approach includes the familiar Laplacian pyramid, the new intensity-dependent spatial summation, and parallel sensing/processing networks. Three-dimensional image gathering is attained by combining laser ranging with sensor-array imaging. The second is the rigorous extension of information theory and optimal filtering to visual information acquisition and processing. The goal is to provide a comprehensive methodology for quantitatively assessing the end-to-end performance of image gathering, coding, and processing.
An engineering code to analyze hypersonic thermal management systems
NASA Technical Reports Server (NTRS)
Vangriethuysen, Valerie J.; Wallace, Clark E.
1993-01-01
Thermal loads on current and future aircraft are increasing and as a result are stressing the energy collection, control, and dissipation capabilities of current thermal management systems and technology. The thermal loads for hypersonic vehicles will be no exception. In fact, with their projected high heat loads and fluxes, hypersonic vehicles are a prime example of systems that will require thermal management systems (TMS) that have been optimized and integrated with the entire vehicle to the maximum extent possible during the initial design stages. This will not only be to meet operational requirements, but also to fulfill weight and performance constraints in order for the vehicle to takeoff and complete its mission successfully. To meet this challenge, the TMS can no longer be two or more entirely independent systems, nor can thermal management be an after thought in the design process, the typical pervasive approach in the past. Instead, a TMS that was integrated throughout the entire vehicle and subsequently optimized will be required. To accomplish this, a method that iteratively optimizes the TMS throughout the vehicle will not only be highly desirable, but advantageous in order to reduce the manhours normally required to conduct the necessary tradeoff studies and comparisons. A thermal management engineering computer code that is under development and being managed at Wright Laboratory, Wright-Patterson AFB, is discussed. The primary goal of the code is to aid in the development of a hypersonic vehicle TMS that has been optimized and integrated on a total vehicle basis.
User's manual for the BNW-I optimization code for dry-cooled power plants. Volume III. [PLCIRI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Braun, D.J.; Daniel, D.J.; De Mier, W.V.
1977-01-01
This appendix to User's Manual for the BNW-1 Optimization Code for Dry-Cooled Power Plants provides a listing of the BNW-I optimization code for determining, for a particular size power plant, the optimum dry cooling tower design using a plastic tube cooling surface and circular tower arrangement of the tube bundles. (LCL)
Numerical optimization of perturbative coils for tokamaks
NASA Astrophysics Data System (ADS)
Lazerson, Samuel; Park, Jong-Kyu; Logan, Nikolas; Boozer, Allen; NSTX-U Research Team
2014-10-01
Numerical optimization of coils which apply three dimensional (3D) perturbative fields to tokamaks is presented. The application of perturbative 3D magnetic fields in tokamaks is now commonplace for control of error fields, resistive wall modes, resonant field drive, and neoclassical toroidal viscosity (NTV) torques. The design of such systems has focused on control of toroidal mode number, with coil shapes based on simple window-pane designs. In this work, a numerical optimization suite based on the STELLOPT 3D equilibrium optimization code is presented. The new code, IPECOPT, replaces the VMEC equilibrium code with the IPEC perturbed equilibrium code, and targets NTV torque by coupling to the PENT code. Fixed boundary optimizations of the 3D fields for the NSTX-U experiment are underway. Initial results suggest NTV torques can be driven by normal field spectrums which are not pitch-resonant with the magnetic field lines. Work has focused on driving core torque with n = 1 and edge torques with n = 3 fields. Optimizations of the coil currents for the planned NSTX-U NCC coils highlight the code's free boundary capability. This manuscript has been authored by Princeton University under Contract Number DE-AC02-09CH11466 with the U.S. Department of Energy.
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Men Chunhua; Romeijn, H. Edwin; Jia Xun
2010-11-15
Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequentialmore » way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. Results: The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. Conclusions: The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.« less
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT).
Men, Chunhua; Romeijn, H Edwin; Jia, Xun; Jiang, Steve B
2010-11-01
To develop a novel aperture-based algorithm for volumetric modulated are therapy (VMAT) treatment plan optimization with high quality and high efficiency. The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequential way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.
Rocketdyne/Westinghouse nuclear thermal rocket engine modeling
NASA Technical Reports Server (NTRS)
Glass, James F.
1993-01-01
The topics are presented in viewgraph form and include the following: systems approach needed for nuclear thermal rocket (NTR) design optimization; generic NTR engine power balance codes; rocketdyne nuclear thermal system code; software capabilities; steady state model; NTR engine optimizer code-logic; reactor power calculation logic; sample multi-component configuration; NTR design code output; generic NTR code at Rocketdyne; Rocketdyne NTR model; and nuclear thermal rocket modeling directions.
Making extreme computations possible with virtual machines
NASA Astrophysics Data System (ADS)
Reuter, J.; Chokoufe Nejad, B.; Ohl, T.
2016-10-01
State-of-the-art algorithms generate scattering amplitudes for high-energy physics at leading order for high-multiplicity processes as compiled code (in Fortran, C or C++). For complicated processes the size of these libraries can become tremendous (many GiB). We show that amplitudes can be translated to byte-code instructions, which even reduce the size by one order of magnitude. The byte-code is interpreted by a Virtual Machine with runtimes comparable to compiled code and a better scaling with additional legs. We study the properties of this algorithm, as an extension of the Optimizing Matrix Element Generator (O'Mega). The bytecode matrix elements are available as alternative input for the event generator WHIZARD. The bytecode interpreter can be implemented very compactly, which will help with a future implementation on massively parallel GPUs.
NASA Technical Reports Server (NTRS)
Carlson, Harry W.; Darden, Christine M.
1988-01-01
Extensive correlations of computer code results with experimental data are employed to illustrate the use of linearized theory attached flow methods for the estimation and optimization of the aerodynamic performance of simple hinged flap systems. Use of attached flow methods is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. A variety of swept wing configurations are considered ranging from fighters to supersonic transports, all with leading- and trailing-edge flaps for enhancement of subsonic aerodynamic efficiency. The results indicate that linearized theory attached flow computer code methods provide a rational basis for the estimation and optimization of flap system aerodynamic performance at subsonic speeds. The analysis also indicates that vortex flap design is not an opposing approach but is closely related to attached flow design concepts. The successful vortex flap design actually suppresses the formation of detached vortices to produce a small vortex which is restricted almost entirely to the leading edge flap itself.
NASA Technical Reports Server (NTRS)
Carlson, Harry W.; Darden, Christine M.; Mann, Michael J.
1990-01-01
Extensive correlations of computer code results with experimental data are employed to illustrate the use of a linearized theory, attached flow method for the estimation and optimization of the longitudinal aerodynamic performance of wing-canard and wing-horizontal tail configurations which may employ simple hinged flap systems. Use of an attached flow method is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. The results indicate that linearized theory, attached flow, computer code methods (modified to include estimated attainable leading-edge thrust and an approximate representation of vortex forces) provide a rational basis for the estimation and optimization of aerodynamic performance at subsonic speeds below the drag rise Mach number. Generally, good prediction of aerodynamic performance, as measured by the suction parameter, can be expected for near optimum combinations of canard or horizontal tail incidence and leading- and trailing-edge flap deflections at a given lift coefficient (conditions which tend to produce a predominantly attached flow).
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
High-speed architecture for the decoding of trellis-coded modulation
NASA Technical Reports Server (NTRS)
Osborne, William P.
1992-01-01
Since 1971, when the Viterbi Algorithm was introduced as the optimal method of decoding convolutional codes, improvements in circuit technology, especially VLSI, have steadily increased its speed and practicality. Trellis-Coded Modulation (TCM) combines convolutional coding with higher level modulation (non-binary source alphabet) to provide forward error correction and spectral efficiency. For binary codes, the current stare-of-the-art is a 64-state Viterbi decoder on a single CMOS chip, operating at a data rate of 25 Mbps. Recently, there has been an interest in increasing the speed of the Viterbi Algorithm by improving the decoder architecture, or by reducing the algorithm itself. Designs employing new architectural techniques are now in existence, however these techniques are currently applied to simpler binary codes, not to TCM. The purpose of this report is to discuss TCM architectural considerations in general, and to present the design, at the logic gate level, or a specific TCM decoder which applies these considerations to achieve high-speed decoding.
NASA Astrophysics Data System (ADS)
Nakamura, Yusuke; Hoshizawa, Taku
2016-09-01
Two methods for increasing the data capacity of a holographic data storage system (HDSS) were developed. The first method is called “run-length-limited (RLL) high-density recording”. An RLL modulation has the same effect as enlarging the pixel pitch; namely, it optically reduces the hologram size. Accordingly, the method doubles the raw-data recording density. The second method is called “RLL turbo signal processing”. The RLL turbo code consists of \\text{RLL}(1,∞ ) trellis modulation and an optimized convolutional code. The remarkable point of the developed turbo code is that it employs the RLL modulator and demodulator as parts of the error-correction process. The turbo code improves the capability of error correction more than a conventional LDPC code, even though interpixel interference is generated. These two methods will increase the data density 1.78-fold. Moreover, by simulation and experiment, a data density of 2.4 Tbit/in.2 is confirmed.
NASA Astrophysics Data System (ADS)
El-Wardany, Tahany; Lynch, Mathew; Gu, Wenjiong; Hsu, Arthur; Klecka, Michael; Nardi, Aaron; Viens, Daniel
This paper proposes an optimization framework enabling the integration of multi-scale / multi-physics simulation codes to perform structural optimization design for additively manufactured components. Cold spray was selected as the additive manufacturing (AM) process and its constraints were identified and included in the optimization scheme. The developed framework first utilizes topology optimization to maximize stiffness for conceptual design. The subsequent step applies shape optimization to refine the design for stress-life fatigue. The component weight was reduced by 20% while stresses were reduced by 75% and the rigidity was improved by 37%. The framework and analysis codes were implemented using Altair software as well as an in-house loading code. The optimized design was subsequently produced by the cold spray process.
Optimized atom position and coefficient coding for matching pursuit-based image compression.
Shoa, Alireza; Shirani, Shahram
2009-12-01
In this paper, we propose a new encoding algorithm for matching pursuit image coding. We show that coding performance is improved when correlations between atom positions and atom coefficients are both used in encoding. We find the optimum tradeoff between efficient atom position coding and efficient atom coefficient coding and optimize the encoder parameters. Our proposed algorithm outperforms the existing coding algorithms designed for matching pursuit image coding. Additionally, we show that our algorithm results in better rate distortion performance than JPEG 2000 at low bit rates.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen
2015-10-01
The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.
Optimizing fusion PIC code performance at scale on Cori Phase 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, T. S.; Deslippe, J.
In this paper we present the results of optimizing the performance of the gyrokinetic full-f fusion PIC code XGC1 on the Cori Phase Two Knights Landing system. The code has undergone substantial development to enable the use of vector instructions in its most expensive kernels within the NERSC Exascale Science Applications Program. We study the single-node performance of the code on an absolute scale using the roofline methodology to guide optimization efforts. We have obtained 2x speedups in single node performance due to enabling vectorization and performing memory layout optimizations. On multiple nodes, the code is shown to scale wellmore » up to 4000 nodes, near half the size of the machine. We discuss some communication bottlenecks that were identified and resolved during the work.« less
Joint-layer encoder optimization for HEVC scalable extensions
NASA Astrophysics Data System (ADS)
Tsai, Chia-Ming; He, Yuwen; Dong, Jie; Ye, Yan; Xiu, Xiaoyu; He, Yong
2014-09-01
Scalable video coding provides an efficient solution to support video playback on heterogeneous devices with various channel conditions in heterogeneous networks. SHVC is the latest scalable video coding standard based on the HEVC standard. To improve enhancement layer coding efficiency, inter-layer prediction including texture and motion information generated from the base layer is used for enhancement layer coding. However, the overall performance of the SHVC reference encoder is not fully optimized because rate-distortion optimization (RDO) processes in the base and enhancement layers are independently considered. It is difficult to directly extend the existing joint-layer optimization methods to SHVC due to the complicated coding tree block splitting decisions and in-loop filtering process (e.g., deblocking and sample adaptive offset (SAO) filtering) in HEVC. To solve those problems, a joint-layer optimization method is proposed by adjusting the quantization parameter (QP) to optimally allocate the bit resource between layers. Furthermore, to make more proper resource allocation, the proposed method also considers the viewing probability of base and enhancement layers according to packet loss rate. Based on the viewing probability, a novel joint-layer RD cost function is proposed for joint-layer RDO encoding. The QP values of those coding tree units (CTUs) belonging to lower layers referenced by higher layers are decreased accordingly, and the QP values of those remaining CTUs are increased to keep total bits unchanged. Finally the QP values with minimal joint-layer RD cost are selected to match the viewing probability. The proposed method was applied to the third temporal level (TL-3) pictures in the Random Access configuration. Simulation results demonstrate that the proposed joint-layer optimization method can improve coding performance by 1.3% for these TL-3 pictures compared to the SHVC reference encoder without joint-layer optimization.
Optimization of Particle-in-Cell Codes on RISC Processors
NASA Technical Reports Server (NTRS)
Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.
1996-01-01
General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Strong scaling of general-purpose molecular dynamics simulations on GPUs
NASA Astrophysics Data System (ADS)
Glaser, Jens; Nguyen, Trung Dac; Anderson, Joshua A.; Lui, Pak; Spiga, Filippo; Millan, Jaime A.; Morse, David C.; Glotzer, Sharon C.
2015-07-01
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5 ×.
Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin
2009-01-01
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Using concatenated quantum codes for universal fault-tolerant quantum gates.
Jochym-O'Connor, Tomas; Laflamme, Raymond
2014-01-10
We propose a method for universal fault-tolerant quantum computation using concatenated quantum error correcting codes. The concatenation scheme exploits the transversal properties of two different codes, combining them to provide a means to protect against low-weight arbitrary errors. We give the required properties of the error correcting codes to ensure universal fault tolerance and discuss a particular example using the 7-qubit Steane and 15-qubit Reed-Muller codes. Namely, other than computational basis state preparation as required by the DiVincenzo criteria, our scheme requires no special ancillary state preparation to achieve universality, as opposed to schemes such as magic state distillation. We believe that optimizing the codes used in such a scheme could provide a useful alternative to state distillation schemes that exhibit high overhead costs.
Optimal patch code design via device characterization
NASA Astrophysics Data System (ADS)
Wu, Wencheng; Dalal, Edul N.
2012-01-01
In many color measurement applications, such as those for color calibration and profiling, "patch code" has been used successfully for job identification and automation to reduce operator errors. A patch code is similar to a barcode, but is intended primarily for use in measurement devices that cannot read barcodes due to limited spatial resolution, such as spectrophotometers. There is an inherent tradeoff between decoding robustness and the number of code levels available for encoding. Previous methods have attempted to address this tradeoff, but those solutions have been sub-optimal. In this paper, we propose a method to design optimal patch codes via device characterization. The tradeoff between decoding robustness and the number of available code levels is optimized in terms of printing and measurement efforts, and decoding robustness against noises from the printing and measurement devices. Effort is drastically reduced relative to previous methods because print-and-measure is minimized through modeling and the use of existing printer profiles. Decoding robustness is improved by distributing the code levels in CIE Lab space rather than in CMYK space.
NASA Astrophysics Data System (ADS)
Jubran, Mohammad K.; Bansal, Manu; Kondi, Lisimachos P.
2006-01-01
In this paper, we consider the problem of optimal bit allocation for wireless video transmission over fading channels. We use a newly developed hybrid scalable/multiple-description codec that combines the functionality of both scalable and multiple-description codecs. It produces a base layer and multiple-description enhancement layers. Any of the enhancement layers can be decoded (in a non-hierarchical manner) with the base layer to improve the reconstructed video quality. Two different channel coding schemes (Rate-Compatible Punctured Convolutional (RCPC)/Cyclic Redundancy Check (CRC) coding and, product code Reed Solomon (RS)+RCPC/CRC coding) are used for unequal error protection of the layered bitstream. Optimal allocation of the bitrate between source and channel coding is performed for discrete sets of source coding rates and channel coding rates. Experimental results are presented for a wide range of channel conditions. Also, comparisons with classical scalable coding show the effectiveness of using hybrid scalable/multiple-description coding for wireless transmission.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
DSP code optimization based on cache
NASA Astrophysics Data System (ADS)
Xu, Chengfa; Li, Chengcheng; Tang, Bin
2013-03-01
DSP program's running efficiency on board is often lower than which via the software simulation during the program development, which is mainly resulted from the user's improper use and incomplete understanding of the cache-based memory. This paper took the TI TMS320C6455 DSP as an example, analyzed its two-level internal cache, and summarized the methods of code optimization. Processor can achieve its best performance when using these code optimization methods. At last, a specific algorithm application in radar signal processing is proposed. Experiment result shows that these optimization are efficient.
Demonstration of Automatically-Generated Adjoint Code for Use in Aerodynamic Shape Optimization
NASA Technical Reports Server (NTRS)
Green, Lawrence; Carle, Alan; Fagan, Mike
1999-01-01
Gradient-based optimization requires accurate derivatives of the objective function and constraints. These gradients may have previously been obtained by manual differentiation of analysis codes, symbolic manipulators, finite-difference approximations, or existing automatic differentiation (AD) tools such as ADIFOR (Automatic Differentiation in FORTRAN). Each of these methods has certain deficiencies, particularly when applied to complex, coupled analyses with many design variables. Recently, a new AD tool called ADJIFOR (Automatic Adjoint Generation in FORTRAN), based upon ADIFOR, was developed and demonstrated. Whereas ADIFOR implements forward-mode (direct) differentiation throughout an analysis program to obtain exact derivatives via the chain rule of calculus, ADJIFOR implements the reverse-mode counterpart of the chain rule to obtain exact adjoint form derivatives from FORTRAN code. Automatically-generated adjoint versions of the widely-used CFL3D computational fluid dynamics (CFD) code and an algebraic wing grid generation code were obtained with just a few hours processing time using the ADJIFOR tool. The codes were verified for accuracy and were shown to compute the exact gradient of the wing lift-to-drag ratio, with respect to any number of shape parameters, in about the time required for 7 to 20 function evaluations. The codes have now been executed on various computers with typical memory and disk space for problems with up to 129 x 65 x 33 grid points, and for hundreds to thousands of independent variables. These adjoint codes are now used in a gradient-based aerodynamic shape optimization problem for a swept, tapered wing. For each design iteration, the optimization package constructs an approximate, linear optimization problem, based upon the current objective function, constraints, and gradient values. The optimizer subroutines are called within a design loop employing the approximate linear problem until an optimum shape is found, the design loop limit is reached, or no further design improvement is possible due to active design variable bounds and/or constraints. The resulting shape parameters are then used by the grid generation code to define a new wing surface and computational grid. The lift-to-drag ratio and its gradient are computed for the new design by the automatically-generated adjoint codes. Several optimization iterations may be required to find an optimum wing shape. Results from two sample cases will be discussed. The reader should note that this work primarily represents a demonstration of use of automatically- generated adjoint code within an aerodynamic shape optimization. As such, little significance is placed upon the actual optimization results, relative to the method for obtaining the results.
Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Carter, Jonathan; Oliker, Leonid
2008-02-01
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHDmore » for each system, we develop a code generator that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 14x improvement compared with the original code. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
Lattice Boltzmann simulation optimization on leading multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, S.; Carter, J.; Oliker, L.
2008-01-01
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHDmore » for each system, we develop a code generator that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our autotuned LBMHD application achieves up to a 14 improvement compared with the original code. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
Employment and residential characteristics in relation to automated external defibrillator locations
Griffis, Heather M.; Band, Roger A; Ruther, Matthew; Harhay, Michael; Asch, David A.; Hershey, John C.; Hill, Shawndra; Nadkarni, Lindsay; Kilaru, Austin; Branas, Charles C.; Shofer, Frances; Nichol, Graham; Becker, Lance B.; Merchant, Raina M.
2015-01-01
Background Survival from out-of-hospital cardiac arrest (OHCA) is generally poor and varies by geography. Variability in automated external defibrillator (AED) locations may be a contributing factor. To inform optimal placement of AEDs, we investigated AED access in a major US city relative to demographic and employment characteristics. Methods and Results This was a retrospective analysis of a Philadelphia AED registry (2,559 total AEDs). The 2010 US Census and the Local Employment Dynamics (LED) database by ZIP code was used. AED access was calculated as the weighted areal percentage of each ZIP code covered by a 400 meter radius around each AED. Of 47 ZIP codes, only 9%(4) were high AED service areas. In 26%(12) of ZIP codes, less than 35% of the area was covered by AED service areas. Higher AED access ZIP codes were more likely to have a moderately populated residential area (p=0.032), higher median household income (p=0.006), and higher paying jobs (p=008). Conclusions The locations of AEDs vary across specific ZIP codes; select residential and employment characteristics explain some variation. Further work on evaluating OHCA locations, AED use and availability, and OHCA outcomes could inform AED placement policies. Optimizing the placement of AEDs through this work may help to increase survival. PMID:26856232
Three-Dimensional Nacelle Aeroacoustics Code With Application to Impedance Education
NASA Technical Reports Server (NTRS)
Watson, Willie R.
2000-01-01
A three-dimensional nacelle acoustics code that accounts for uniform mean flow and variable surface impedance liners is developed. The code is linked to a commercial version of the NASA-developed General Purpose Solver (for solution of linear systems of equations) in order to obtain the capability to study high frequency waves that may require millions of grid points for resolution. Detailed, single-processor statistics for the performance of the solver in rigid and soft-wall ducts are presented. Over the range of frequencies of current interest in nacelle liner research, noise attenuation levels predicted from the code were in excellent agreement with those predicted from mode theory. The equation solver is memory efficient, requiring only a small fraction of the memory available on modern computers. As an application, the code is combined with an optimization algorithm and used to reduce the impedance spectrum of a ceramic liner. The primary problem with using the code to perform optimization studies at frequencies above I1kHz is the excessive CPU time (a major portion of which is matrix assembly). The research recommends that research be directed toward development of a rapid sparse assembler and exploitation of the multiprocessor capability of the solver to further reduce CPU time.
The path toward HEP High Performance Computing
NASA Astrophysics Data System (ADS)
Apostolakis, John; Brun, René; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro
2014-06-01
High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a "High Performance" implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from the recent technology evolution in computing.
User's manual for the BNW-II optimization code for dry/wet-cooled power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Braun, D.J.; Bamberger, J.A.; Braun, D.J.
1978-05-01
This volume provides a listing of the BNW-II dry/wet ammonia heat rejection optimization code and is an appendix to Volume I which gives a narrative description of the code's algorithms as well as logic, input and output information.
On the optimum signal constellation design for high-speed optical transport networks.
Liu, Tao; Djordjevic, Ivan B
2012-08-27
In this paper, we first describe an optimum signal constellation design algorithm, which is optimum in MMSE-sense, called MMSE-OSCD, for channel capacity achieving source distribution. Secondly, we introduce a feedback channel capacity inspired optimum signal constellation design (FCC-OSCD) to further improve the performance of MMSE-OSCD, inspired by the fact that feedback channel capacity is higher than that of systems without feedback. The constellations obtained by FCC-OSCD are, however, OSNR dependent. The optimization is jointly performed together with regular quasi-cyclic low-density parity-check (LDPC) code design. Such obtained coded-modulation scheme, in combination with polarization-multiplexing, is suitable as both 400 Gb/s and multi-Tb/s optical transport enabling technology. Using large girth LDPC code, we demonstrate by Monte Carlo simulations that a 32-ary signal constellation, obtained by FCC-OSCD, outperforms previously proposed optimized 32-ary CIPQ signal constellation by 0.8 dB at BER of 10(-7). On the other hand, the LDPC-coded 16-ary FCC-OSCD outperforms 16-QAM by 1.15 dB at the same BER.
Lee, Chaewoo
2014-01-01
The advancement in wideband wireless network supports real time services such as IPTV and live video streaming. However, because of the sharing nature of the wireless medium, efficient resource allocation has been studied to achieve a high level of acceptability and proliferation of wireless multimedia. Scalable video coding (SVC) with adaptive modulation and coding (AMC) provides an excellent solution for wireless video streaming. By assigning different modulation and coding schemes (MCSs) to video layers, SVC can provide good video quality to users in good channel conditions and also basic video quality to users in bad channel conditions. For optimal resource allocation, a key issue in applying SVC in the wireless multicast service is how to assign MCSs and the time resources to each SVC layer in the heterogeneous channel condition. We formulate this problem with integer linear programming (ILP) and provide numerical results to show the performance under 802.16 m environment. The result shows that our methodology enhances the overall system throughput compared to an existing algorithm. PMID:25276862
The Use of a Code-generating System for the Derivation of the Equations for Wind Turbine Dynamics
NASA Astrophysics Data System (ADS)
Ganander, Hans
2003-10-01
For many reasons the size of wind turbines on the rapidly growing wind energy market is increasing. Relations between aeroelastic properties of these new large turbines change. Modifications of turbine designs and control concepts are also influenced by growing size. All these trends require development of computer codes for design and certification. Moreover, there is a strong desire for design optimization procedures, which require fast codes. General codes, e.g. finite element codes, normally allow such modifications and improvements of existing wind turbine models. This is done relatively easy. However, the calculation times of such codes are unfavourably long, certainly for optimization use. The use of an automatic code generating system is an alternative for relevance of the two key issues, the code and the design optimization. This technique can be used for rapid generation of codes of particular wind turbine simulation models. These ideas have been followed in the development of new versions of the wind turbine simulation code VIDYN. The equations of the simulation model were derived according to the Lagrange equation and using Mathematica®, which was directed to output the results in Fortran code format. In this way the simulation code is automatically adapted to an actual turbine model, in terms of subroutines containing the equations of motion, definitions of parameters and degrees of freedom. Since the start in 1997, these methods, constituting a systematic way of working, have been used to develop specific efficient calculation codes. The experience with this technique has been very encouraging, inspiring the continued development of new versions of the simulation code as the need has arisen, and the interest for design optimization is growing.
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
Aeroelastic Tailoring Study of N+2 Low-Boom Supersonic Commercial Transport Aircraft
NASA Technical Reports Server (NTRS)
Pak, Chan-gi
2015-01-01
The Lockheed Martins N+2 Low-boom Supersonic Commercial Transport (LSCT) aircraft is optimized in this study through the use of a multidisciplinary design optimization tool developed at the NASA Armstrong Flight Research Center. A total of 111 design variables are used in the first optimization run. Total structural weight is the objective function in this optimization run. Design requirements for strength, buckling, and flutter are selected as constraint functions during the first optimization run. The MSC Nastran code is used to obtain the modal, strength, and buckling characteristics. Flutter and trim analyses are based on ZAERO code and landing and ground control loads are computed using an in-house code.
NASA Technical Reports Server (NTRS)
Lahti, G. P.
1972-01-01
A two- or three-constraint, two-dimensional radiation shield weight optimization procedure and a computer program, DOPEX, is described. The DOPEX code uses the steepest descent method to alter a set of initial (input) thicknesses for a shield configuration to achieve a minimum weight while simultaneously satisfying dose constaints. The code assumes an exponential dose-shield thickness relation with parameters specified by the user. The code also assumes that dose rates in each principal direction are dependent only on thicknesses in that direction. Code input instructions, FORTRAN 4 listing, and a sample problem are given. Typical computer time required to optimize a seven-layer shield is about 0.1 minute on an IBM 7094-2.
NASA Technical Reports Server (NTRS)
Vanderplaats, G. N.; Chen, Xiang; Zhang, Ning-Tian
1988-01-01
The use of formal numerical optimization methods for the design of gears is investigated. To achieve this, computer codes were developed for the analysis of spur gears and spiral bevel gears. These codes calculate the life, dynamic load, bending strength, surface durability, gear weight and size, and various geometric parameters. It is necessary to calculate all such important responses because they all represent competing requirements in the design process. The codes developed here were written in subroutine form and coupled to the COPES/ADS general purpose optimization program. This code allows the user to define the optimization problem at the time of program execution. Typical design variables include face width, number of teeth and diametral pitch. The user is free to choose any calculated response as the design objective to minimize or maximize and may impose lower and upper bounds on any calculated responses. Typical examples include life maximization with limits on dynamic load, stress, weight, etc. or minimization of weight subject to limits on life, dynamic load, etc. The research codes were written in modular form for easy expansion and so that they could be combined to create a multiple reduction optimization capability in future.
Dynamic state estimation based on Poisson spike trains—towards a theory of optimal encoding
NASA Astrophysics Data System (ADS)
Susemihl, Alex; Meir, Ron; Opper, Manfred
2013-03-01
Neurons in the nervous system convey information to higher brain regions by the generation of spike trains. An important question in the field of computational neuroscience is how these sensory neurons encode environmental information in a way which may be simply analyzed by subsequent systems. Many aspects of the form and function of the nervous system have been understood using the concepts of optimal population coding. Most studies, however, have neglected the aspect of temporal coding. Here we address this shortcoming through a filtering theory of inhomogeneous Poisson processes. We derive exact relations for the minimal mean squared error of the optimal Bayesian filter and, by optimizing the encoder, obtain optimal codes for populations of neurons. We also show that a class of non-Markovian, smooth stimuli are amenable to the same treatment, and provide results for the filtering and prediction error which hold for a general class of stochastic processes. This sets a sound mathematical framework for a population coding theory that takes temporal aspects into account. It also formalizes a number of studies which discussed temporal aspects of coding using time-window paradigms, by stating them in terms of correlation times and firing rates. We propose that this kind of analysis allows for a systematic study of temporal coding and will bring further insights into the nature of the neural code.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
Electromagnetic code for naval applications
NASA Astrophysics Data System (ADS)
Crescimbeni, F.; Bessi, F.; Chiti, S.
1988-12-01
The use of an increasing number of electronic apparatus became vital to meet the high performance required for military Navy applications. Thus the number of antennas to be mounted on shipboard greatly increased. As a consequence of the high antenna density, of the complexity of the shipboard environment and of the powers used for communication and radar systems, the EMC (Electro-Magnetic Compatibility) problem is playing a leading role in the design of the topside of a ship. The Italian Navy has acquired a numerical code for the antenna siting and design. This code, together with experimental data measured at the Italian Navy test range facility, allows for the evaluation of optimal sitings for antenna systems on shipboard, and the prediction of their performances in the actual environment. The structure of this code, named Programma Elettromagnetico per Applicazioni Navali, (Electromagnetic Code for Naval Applications) is discussed, together with its capabilities and applications. Also the results obtained in some examples are presented and compared with the measurements.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Numerical optimization of the ramp-down phase with the RAPTOR code
NASA Astrophysics Data System (ADS)
Teplukhina, Anna; Sauter, Olivier; Felici, Federico; The Tcv Team; The ASDEX-Upgrade Team; The Eurofusion Mst1 Team
2017-10-01
The ramp-down optimization goal in this work is defined as the fastest possible decrease of a plasma current while avoiding any disruptions caused by reaching physical or technical limits. Numerical simulations and preliminary experiments on TCV and AUG have shown that a fast decrease of plasma elongation and an adequate timing of the H-L transition during current ramp-down can help to avoid reaching high values of the plasma internal inductance. The RAPTOR code (F. Felici et al., 2012 PPCF 54; F. Felici, 2011 EPFL PhD thesis), developed for real-time plasma control, has been used for an optimization problem solving. Recently the transport model has been extended to include the ion temperature and electron density transport equations in addition to the electron temperature and current density transport equations, increasing the physical applications of the code. The gradient-based models for the transport coefficients (O. Sauter et al., 2014 PPCF 21; D. Kim et al., 2016 PPCF 58) have been implemented to RAPTOR and tested during this work. Simulations of the AUG and TCV entire plasma discharges will be presented. See the author list of S. Coda et al., Nucl. Fusion 57 2017 102011.
Scaling Optimization of the SIESTA MHD Code
NASA Astrophysics Data System (ADS)
Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan
2013-10-01
SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Kosol Kiatreungwattana Kosol Kiatreungwattana Senior Engineer - Building and Renewable Energy experience in building energy systems and renewable technologies, building energy codes, LEED certified projects, sustainable high performance building design, building energy simulation analysis/optimization
NASA Astrophysics Data System (ADS)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; Kalinkin, Alexander A.
2017-02-01
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, which is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,'bottom-up' and 'top-down', are illustrated. Preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.
Medical Ultrasound Video Coding with H.265/HEVC Based on ROI Extraction
Wu, Yueying; Liu, Pengyu; Gao, Yuan; Jia, Kebin
2016-01-01
High-efficiency video compression technology is of primary importance to the storage and transmission of digital medical video in modern medical communication systems. To further improve the compression performance of medical ultrasound video, two innovative technologies based on diagnostic region-of-interest (ROI) extraction using the high efficiency video coding (H.265/HEVC) standard are presented in this paper. First, an effective ROI extraction algorithm based on image textural features is proposed to strengthen the applicability of ROI detection results in the H.265/HEVC quad-tree coding structure. Second, a hierarchical coding method based on transform coefficient adjustment and a quantization parameter (QP) selection process is designed to implement the otherness encoding for ROIs and non-ROIs. Experimental results demonstrate that the proposed optimization strategy significantly improves the coding performance by achieving a BD-BR reduction of 13.52% and a BD-PSNR gain of 1.16 dB on average compared to H.265/HEVC (HM15.0). The proposed medical video coding algorithm is expected to satisfy low bit-rate compression requirements for modern medical communication systems. PMID:27814367
Medical Ultrasound Video Coding with H.265/HEVC Based on ROI Extraction.
Wu, Yueying; Liu, Pengyu; Gao, Yuan; Jia, Kebin
2016-01-01
High-efficiency video compression technology is of primary importance to the storage and transmission of digital medical video in modern medical communication systems. To further improve the compression performance of medical ultrasound video, two innovative technologies based on diagnostic region-of-interest (ROI) extraction using the high efficiency video coding (H.265/HEVC) standard are presented in this paper. First, an effective ROI extraction algorithm based on image textural features is proposed to strengthen the applicability of ROI detection results in the H.265/HEVC quad-tree coding structure. Second, a hierarchical coding method based on transform coefficient adjustment and a quantization parameter (QP) selection process is designed to implement the otherness encoding for ROIs and non-ROIs. Experimental results demonstrate that the proposed optimization strategy significantly improves the coding performance by achieving a BD-BR reduction of 13.52% and a BD-PSNR gain of 1.16 dB on average compared to H.265/HEVC (HM15.0). The proposed medical video coding algorithm is expected to satisfy low bit-rate compression requirements for modern medical communication systems.
Błażej, Paweł; Wnȩtrzak, Małgorzata; Mackiewicz, Paweł
2016-12-01
One of theories explaining the present structure of canonical genetic code assumes that it was optimized to minimize harmful effects of amino acid replacements resulting from nucleotide substitutions and translational errors. A way to testify this concept is to find the optimal code under given criteria and compare it with the canonical genetic code. Unfortunately, the huge number of possible alternatives makes it impossible to find the optimal code using exhaustive methods in sensible time. Therefore, heuristic methods should be applied to search the space of possible solutions. Evolutionary algorithms (EA) seem to be ones of such promising approaches. This class of methods is founded both on mutation and crossover operators, which are responsible for creating and maintaining the diversity of candidate solutions. These operators possess dissimilar characteristics and consequently play different roles in the process of finding the best solutions under given criteria. Therefore, the effective searching for the potential solutions can be improved by applying both of them, especially when these operators are devised specifically for a given problem. To study this subject, we analyze the effectiveness of algorithms for various combinations of mutation and crossover probabilities under three models of the genetic code assuming different restrictions on its structure. To achieve that, we adapt the position based crossover operator for the most restricted model and develop a new type of crossover operator for the more general models. The applied fitness function describes costs of amino acid replacement regarding their polarity. Our results indicate that the usage of crossover operators can significantly improve the quality of the solutions. Moreover, the simulations with the crossover operator optimize the fitness function in the smaller number of generations than simulations without this operator. The optimal genetic codes without restrictions on their structure minimize the costs about 2.7 times better than the canonical genetic code. Interestingly, the optimal codes are dominated by amino acids characterized by polarity close to its average value for all amino acids. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Computer optimization of reactor-thermoelectric space power systems
NASA Technical Reports Server (NTRS)
Maag, W. L.; Finnegan, P. M.; Fishbach, L. H.
1973-01-01
A computer simulation and optimization code that has been developed for nuclear space power systems is described. The results of using this code to analyze two reactor-thermoelectric systems are presented.
Adams, Bradley J; Aschheim, Kenneth W
2016-01-01
Comparison of antemortem and postmortem dental records is a leading method of victim identification, especially for incidents involving a large number of decedents. This process may be expedited with computer software that provides a ranked list of best possible matches. This study provides a comparison of the most commonly used conventional coding and sorting algorithms used in the United States (WinID3) with a simplified coding format that utilizes an optimized sorting algorithm. The simplified system consists of seven basic codes and utilizes an optimized algorithm based largely on the percentage of matches. To perform this research, a large reference database of approximately 50,000 antemortem and postmortem records was created. For most disaster scenarios, the proposed simplified codes, paired with the optimized algorithm, performed better than WinID3 which uses more complex codes. The detailed coding system does show better performance with extremely large numbers of records and/or significant body fragmentation. © 2015 American Academy of Forensic Sciences.
Flow range enhancement by secondary flow effect in low solidity circular cascade diffusers
NASA Astrophysics Data System (ADS)
Sakaguchi, Daisaku; Tun, Min Thaw; Mizokoshi, Kanata; Kishikawa, Daiki
2014-08-01
High-pressure ratio and wide operating range are highly required for compressors and blowers. The technical issue of the design is achievement of suppression of flow separation at small flow rate without deteriorating the efficiency at design flow rate. A numerical simulation is very effective in design procedure, however, cost of the numerical simulation is generally high during the practical design process, and it is difficult to confirm the optimal design which is combined with many parameters. A multi-objective optimization technique is the idea that has been proposed for solving the problem in practical design process. In this study, a Low Solidity circular cascade Diffuser (LSD) in a centrifugal blower is successfully designed by means of multi-objective optimization technique. An optimization code with a meta-model assisted evolutionary algorithm is used with a commercial CFD code ANSYS-CFX. The optimization is aiming at improving the static pressure coefficient at design point and at low flow rate condition while constraining the slope of the lift coefficient curve. Moreover, a small tip clearance of the LSD blade was applied in order to activate and to stabilize the secondary flow effect at small flow rate condition. The optimized LSD blade has an extended operating range of 114 % towards smaller flow rate as compared to the baseline design without deteriorating the diffuser pressure recovery at design point. The diffuser pressure rise and operating flow range of the optimized LSD blade are experimentally verified by overall performance test. The detailed flow in the diffuser is also confirmed by means of a Particle Image Velocimeter. Secondary flow is clearly captured by PIV and it spreads to the whole area of LSD blade pitch. It is found that the optimized LSD blade shows good improvement of the blade loading in the whole operating range, while at small flow rate the flow separation on the LSD blade has been successfully suppressed by the secondary flow effect.
Product code optimization for determinate state LDPC decoding in robust image transmission.
Thomos, Nikolaos; Boulgouris, Nikolaos V; Strintzis, Michael G
2006-08-01
We propose a novel scheme for error-resilient image transmission. The proposed scheme employs a product coder consisting of low-density parity check (LDPC) codes and Reed-Solomon codes in order to deal effectively with bit errors. The efficiency of the proposed scheme is based on the exploitation of determinate symbols in Tanner graph decoding of LDPC codes and a novel product code optimization technique based on error estimation. Experimental evaluation demonstrates the superiority of the proposed system in comparison to recent state-of-the-art techniques for image transmission.
2-Step scalar deadzone quantization for bitplane image coding.
Auli-Llinas, Francesc
2013-12-01
Modern lossy image coding systems generate a quality progressive codestream that, truncated at increasing rates, produces an image with decreasing distortion. Quality progressivity is commonly provided by an embedded quantizer that employs uniform scalar deadzone quantization (USDQ) together with a bitplane coding strategy. This paper introduces a 2-step scalar deadzone quantization (2SDQ) scheme that achieves same coding performance as that of USDQ while reducing the coding passes and the emitted symbols of the bitplane coding engine. This serves to reduce the computational costs of the codec and/or to code high dynamic range images. The main insights behind 2SDQ are the use of two quantization step sizes that approximate wavelet coefficients with more or less precision depending on their density, and a rate-distortion optimization technique that adjusts the distortion decreases produced when coding 2SDQ indexes. The integration of 2SDQ in current codecs is straightforward. The applicability and efficiency of 2SDQ are demonstrated within the framework of JPEG2000.
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.
2016-12-01
The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Zhang, Guangzhi; Cai, Shaobin; Xiong, Naixue
2018-01-01
One of the remarkable challenges about Wireless Sensor Networks (WSN) is how to transfer the collected data efficiently due to energy limitation of sensor nodes. Network coding will increase network throughput of WSN dramatically due to the broadcast nature of WSN. However, the network coding usually propagates a single original error over the whole network. Due to the special property of error propagation in network coding, most of error correction methods cannot correct more than C/2 corrupted errors where C is the max flow min cut of the network. To maximize the effectiveness of network coding applied in WSN, a new error-correcting mechanism to confront the propagated error is urgently needed. Based on the social network characteristic inherent in WSN and L1 optimization, we propose a novel scheme which successfully corrects more than C/2 corrupted errors. What is more, even if the error occurs on all the links of the network, our scheme also can correct errors successfully. With introducing a secret channel and a specially designed matrix which can trap some errors, we improve John and Yi’s model so that it can correct the propagated errors in network coding which usually pollute exactly 100% of the received messages. Taking advantage of the social characteristic inherent in WSN, we propose a new distributed approach that establishes reputation-based trust among sensor nodes in order to identify the informative upstream sensor nodes. With referred theory of social networks, the informative relay nodes are selected and marked with high trust value. The two methods of L1 optimization and utilizing social characteristic coordinate with each other, and can correct the propagated error whose fraction is even exactly 100% in WSN where network coding is performed. The effectiveness of the error correction scheme is validated through simulation experiments. PMID:29401668
Zhang, Guangzhi; Cai, Shaobin; Xiong, Naixue
2018-02-03
One of the remarkable challenges about Wireless Sensor Networks (WSN) is how to transfer the collected data efficiently due to energy limitation of sensor nodes. Network coding will increase network throughput of WSN dramatically due to the broadcast nature of WSN. However, the network coding usually propagates a single original error over the whole network. Due to the special property of error propagation in network coding, most of error correction methods cannot correct more than C /2 corrupted errors where C is the max flow min cut of the network. To maximize the effectiveness of network coding applied in WSN, a new error-correcting mechanism to confront the propagated error is urgently needed. Based on the social network characteristic inherent in WSN and L1 optimization, we propose a novel scheme which successfully corrects more than C /2 corrupted errors. What is more, even if the error occurs on all the links of the network, our scheme also can correct errors successfully. With introducing a secret channel and a specially designed matrix which can trap some errors, we improve John and Yi's model so that it can correct the propagated errors in network coding which usually pollute exactly 100% of the received messages. Taking advantage of the social characteristic inherent in WSN, we propose a new distributed approach that establishes reputation-based trust among sensor nodes in order to identify the informative upstream sensor nodes. With referred theory of social networks, the informative relay nodes are selected and marked with high trust value. The two methods of L1 optimization and utilizing social characteristic coordinate with each other, and can correct the propagated error whose fraction is even exactly 100% in WSN where network coding is performed. The effectiveness of the error correction scheme is validated through simulation experiments.
Optimization of a Lattice Boltzmann Computation on State-of-the-Art Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Carter, Jonathan; Oliker, Leonid
2009-04-10
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon E5345 (Clovertown), AMD Opteron 2214 (Santa Rosa), AMD Opteron 2356 (Barcelona), Sun T5140 T2+ (Victoria Falls), as well asmore » a QS20 IBM Cell Blade. Rather than hand-tuning LBMHD for each system, we develop a code generator that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 15x improvement compared with the original code at a given concurrency. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
New technologies for advanced three-dimensional optimum shape design in aeronautics
NASA Astrophysics Data System (ADS)
Dervieux, Alain; Lanteri, Stéphane; Malé, Jean-Michel; Marco, Nathalie; Rostaing-Schmidt, Nicole; Stoufflet, Bruno
1999-05-01
The analysis of complex flows around realistic aircraft geometries is becoming more and more predictive. In order to obtain this result, the complexity of flow analysis codes has been constantly increasing, involving more refined fluid models and sophisticated numerical methods. These codes can only run on top computers, exhausting their memory and CPU capabilities. It is, therefore, difficult to introduce best analysis codes in a shape optimization loop: most previous works in the optimum shape design field used only simplified analysis codes. Moreover, as the most popular optimization methods are the gradient-based ones, the more complex the flow solver, the more difficult it is to compute the sensitivity code. However, emerging technologies are contributing to make such an ambitious project, of including a state-of-the-art flow analysis code into an optimisation loop, feasible. Among those technologies, there are three important issues that this paper wishes to address: shape parametrization, automated differentiation and parallel computing. Shape parametrization allows faster optimization by reducing the number of design variable; in this work, it relies on a hierarchical multilevel approach. The sensitivity code can be obtained using automated differentiation. The automated approach is based on software manipulation tools, which allow the differentiation to be quick and the resulting differentiated code to be rather fast and reliable. In addition, the parallel algorithms implemented in this work allow the resulting optimization software to run on increasingly larger geometries. Copyright
NASA Astrophysics Data System (ADS)
Phan, Duoc T.; Lim, James B. P.; Sha, Wei; Siew, Calvin Y. M.; Tanyimboh, Tiku T.; Issa, Honar K.; Mohammad, Fouad A.
2013-04-01
Cold-formed steel portal frames are a popular form of construction for low-rise commercial, light industrial and agricultural buildings with spans of up to 20 m. In this article, a real-coded genetic algorithm is described that is used to minimize the cost of the main frame of such buildings. The key decision variables considered in this proposed algorithm consist of both the spacing and pitch of the frame as continuous variables, as well as the discrete section sizes. A routine taking the structural analysis and frame design for cold-formed steel sections is embedded into a genetic algorithm. The results show that the real-coded genetic algorithm handles effectively the mixture of design variables, with high robustness and consistency in achieving the optimum solution. All wind load combinations according to Australian code are considered in this research. Results for frames with knee braces are also included, for which the optimization achieved even larger savings in cost.
Trellis coding techniques for mobile communications
NASA Technical Reports Server (NTRS)
Divsalar, D.; Simon, M. K.; Jedrey, T.
1988-01-01
A criterion for designing optimum trellis codes to be used over fading channels is given. A technique is shown for reducing certain multiple trellis codes, optimally designed for the fading channel, to conventional (i.e., multiplicity one) trellis codes. The computational cutoff rate R0 is evaluated for MPSK transmitted over fading channels. Examples of trellis codes optimally designed for the Rayleigh fading channel are given and compared with respect to R0. Two types of modulation/demodulation techniques are considered, namely coherent (using pilot tone-aided carrier recovery) and differentially coherent with Doppler frequency correction. Simulation results are given for end-to-end performance of two trellis-coded systems.
Load management strategy for Particle-In-Cell simulations in high energy particle acceleration
NASA Astrophysics Data System (ADS)
Beck, A.; Frederiksen, J. T.; Dérouillat, J.
2016-09-01
In the wake of the intense effort made for the experimental CILEX project, numerical simulation campaigns have been carried out in order to finalize the design of the facility and to identify optimal laser and plasma parameters. These simulations bring, of course, important insight into the fundamental physics at play. As a by-product, they also characterize the quality of our theoretical and numerical models. In this paper, we compare the results given by different codes and point out algorithmic limitations both in terms of physical accuracy and computational performances. These limitations are illustrated in the context of electron laser wakefield acceleration (LWFA). The main limitation we identify in state-of-the-art Particle-In-Cell (PIC) codes is computational load imbalance. We propose an innovative algorithm to deal with this specific issue as well as milestones towards a modern, accurate high-performance PIC code for high energy particle acceleration.
Tool Support for Software Lookup Table Optimization
Wilcox, Chris; Strout, Michelle Mills; Bieman, James M.
2011-01-01
A number of scientific applications are performance-limited by expressions that repeatedly call costly elementary functions. Lookup table (LUT) optimization accelerates the evaluation of such functions by reusing previously computed results. LUT methods can speed up applications that tolerate an approximation of function results, thereby achieving a high level of fuzzy reuse. One problem with LUT optimization is the difficulty of controlling the tradeoff between performance and accuracy. The current practice of manual LUT optimization adds programming effort by requiring extensive experimentation to make this tradeoff, and such hand tuning can obfuscate algorithms. In this paper we describe a methodology andmore » tool implementation to improve the application of software LUT optimization. Our Mesa tool implements source-to-source transformations for C or C++ code to automate the tedious and error-prone aspects of LUT generation such as domain profiling, error analysis, and code generation. We evaluate Mesa with five scientific applications. Our results show a performance improvement of 3.0× and 6.9× for two molecular biology algorithms, 1.4× for a molecular dynamics program, 2.1× to 2.8× for a neural network application, and 4.6× for a hydrology calculation. We find that Mesa enables LUT optimization with more control over accuracy and less effort than manual approaches.« less
Optimization of algorithm of coding of genetic information of Chlamydia
NASA Astrophysics Data System (ADS)
Feodorova, Valentina A.; Ulyanov, Sergey S.; Zaytsev, Sergey S.; Saltykov, Yury V.; Ulianova, Onega V.
2018-04-01
New method of coding of genetic information using coherent optical fields is developed. Universal technique of transformation of nucleotide sequences of bacterial gene into laser speckle pattern is suggested. Reference speckle patterns of the nucleotide sequences of omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, J and K and Chlamydia psittaci serovar I as well are generated. Algorithm of coding of gene information into speckle pattern is optimized. Fully developed speckles with Gaussian statistics for gene-based speckles have been used as criterion of optimization.
Code Optimization for the Choi-Williams Distribution for ELINT Applications
2009-12-01
Probability of Intercept N Number of Samples NPS Naval Postgraduate School SNR Signal To Noise Ratio WVD Wigner - Ville Distribution xvi THIS PAGE...Many of the optimizations developed can be applied to the computation of the Wigner - Ville distribution as well. This work is highly applicable in the...made can also be used to increase the speed at which the Wigner - Ville distribution (another signal processing algorithm) can be computed. These
DUKSUP: A Computer Program for High Thrust Launch Vehicle Trajectory Design and Optimization
NASA Technical Reports Server (NTRS)
Williams, C. H.; Spurlock, O. F.
2014-01-01
From the late 1960's through 1997, the leadership of NASA's Intermediate and Large class unmanned expendable launch vehicle projects resided at the NASA Lewis (now Glenn) Research Center (LeRC). One of LeRC's primary responsibilities --- trajectory design and performance analysis --- was accomplished by an internally-developed analytic three dimensional computer program called DUKSUP. Because of its Calculus of Variations-based optimization routine, this code was generally more capable of finding optimal solutions than its contemporaries. A derivation of optimal control using the Calculus of Variations is summarized including transversality, intermediate, and final conditions. The two point boundary value problem is explained. A brief summary of the code's operation is provided, including iteration via the Newton-Raphson scheme and integration of variational and motion equations via a 4th order Runge-Kutta scheme. Main subroutines are discussed. The history of the LeRC trajectory design efforts in the early 1960's is explained within the context of supporting the Centaur upper stage program. How the code was constructed based on the operation of the Atlas/Centaur launch vehicle, the limits of the computers of that era, the limits of the computer programming languages, and the missions it supported are discussed. The vehicles DUKSUP supported (Atlas/Centaur, Titan/Centaur, and Shuttle/Centaur) are briefly described. The types of missions, including Earth orbital and interplanetary, are described. The roles of flight constraints and their impact on launch operations are detailed (such as jettisoning hardware on heating, Range Safety, ground station tracking, and elliptical parking orbits). The computer main frames on which the code was hosted are described. The applications of the code are detailed, including independent check of contractor analysis, benchmarking, leading edge analysis, and vehicle performance improvement assessments. Several of DUKSUP's many major impacts on launches are discussed including Intelsat, Voyager, Pioneer Venus, HEAO, Galileo, and Cassini.
DUKSUP: A Computer Program for High Thrust Launch Vehicle Trajectory Design and Optimization
NASA Technical Reports Server (NTRS)
Spurlock, O. Frank; Williams, Craig H.
2015-01-01
From the late 1960s through 1997, the leadership of NASAs Intermediate and Large class unmanned expendable launch vehicle projects resided at the NASA Lewis (now Glenn) Research Center (LeRC). One of LeRCs primary responsibilities --- trajectory design and performance analysis --- was accomplished by an internally-developed analytic three dimensional computer program called DUKSUP. Because of its Calculus of Variations-based optimization routine, this code was generally more capable of finding optimal solutions than its contemporaries. A derivation of optimal control using the Calculus of Variations is summarized including transversality, intermediate, and final conditions. The two point boundary value problem is explained. A brief summary of the codes operation is provided, including iteration via the Newton-Raphson scheme and integration of variational and motion equations via a 4th order Runge-Kutta scheme. Main subroutines are discussed. The history of the LeRC trajectory design efforts in the early 1960s is explained within the context of supporting the Centaur upper stage program. How the code was constructed based on the operation of the AtlasCentaur launch vehicle, the limits of the computers of that era, the limits of the computer programming languages, and the missions it supported are discussed. The vehicles DUKSUP supported (AtlasCentaur, TitanCentaur, and ShuttleCentaur) are briefly described. The types of missions, including Earth orbital and interplanetary, are described. The roles of flight constraints and their impact on launch operations are detailed (such as jettisoning hardware on heating, Range Safety, ground station tracking, and elliptical parking orbits). The computer main frames on which the code was hosted are described. The applications of the code are detailed, including independent check of contractor analysis, benchmarking, leading edge analysis, and vehicle performance improvement assessments. Several of DUKSUPs many major impacts on launches are discussed including Intelsat, Voyager, Pioneer Venus, HEAO, Galileo, and Cassini.
NASA Technical Reports Server (NTRS)
Stahara, S. S.
1984-01-01
An investigation was carried out to complete the preliminary development of a combined perturbation/optimization procedure and associated computational code for designing optimized blade-to-blade profiles of turbomachinery blades. The overall purpose of the procedures developed is to provide demonstration of a rapid nonlinear perturbation method for minimizing the computational requirements associated with parametric design studies of turbomachinery flows. The method combines the multiple parameter nonlinear perturbation method, successfully developed in previous phases of this study, with the NASA TSONIC blade-to-blade turbomachinery flow solver, and the COPES-CONMIN optimization procedure into a user's code for designing optimized blade-to-blade surface profiles of turbomachinery blades. Results of several design applications and a documented version of the code together with a user's manual are provided.
The use of interleaving for reducing radio loss in convolutionally coded systems
NASA Technical Reports Server (NTRS)
Divsalar, D.; Simon, M. K.; Yuen, J. H.
1989-01-01
The use of interleaving after convolutional coding and deinterleaving before Viterbi decoding is proposed. This effectively reduces radio loss at low-loop Signal to Noise Ratios (SNRs) by several decibels and at high-loop SNRs by a few tenths of a decibel. Performance of the coded system can further be enhanced if the modulation index is optimized for this system. This will correspond to a reduction of bit SNR at a certain bit error rate for the overall system. The introduction of interleaving/deinterleaving into communication systems designed for future deep space missions does not substantially complicate their hardware design or increase their system cost.
Welter, David E.; Doherty, John E.; Hunt, Randall J.; Muffels, Christopher T.; Tonkin, Matthew J.; Schreuder, Willem A.
2012-01-01
An object-oriented parameter estimation code was developed to incorporate benefits of object-oriented programming techniques for solving large parameter estimation modeling problems. The code is written in C++ and is a formulation and expansion of the algorithms included in PEST, a widely used parameter estimation code written in Fortran. The new code is called PEST++ and is designed to lower the barriers of entry for users and developers while providing efficient algorithms that can accommodate large, highly parameterized problems. This effort has focused on (1) implementing the most popular features of PEST in a fashion that is easy for novice or experienced modelers to use and (2) creating a software design that is easy to extend; that is, this effort provides a documented object-oriented framework designed from the ground up to be modular and extensible. In addition, all PEST++ source code and its associated libraries, as well as the general run manager source code, have been integrated in the Microsoft Visual Studio® 2010 integrated development environment. The PEST++ code is designed to provide a foundation for an open-source development environment capable of producing robust and efficient parameter estimation tools for the environmental modeling community into the future.
Design and implementation of online automatic judging system
NASA Astrophysics Data System (ADS)
Liang, Haohui; Chen, Chaojie; Zhong, Xiuyu; Chen, Yuefeng
2017-06-01
For lower efficiency and poorer reliability in programming training and competition by currently artificial judgment, design an Online Automatic Judging (referred to as OAJ) System. The OAJ system including the sandbox judging side and Web side, realizes functions of automatically compiling and running the tested codes, and generating evaluation scores and corresponding reports. To prevent malicious codes from damaging system, the OAJ system utilizes sandbox, ensuring the safety of the system. The OAJ system uses thread pools to achieve parallel test, and adopt database optimization mechanism, such as horizontal split table, to improve the system performance and resources utilization rate. The test results show that the system has high performance, high reliability, high stability and excellent extensibility.
TRO-2D - A code for rational transonic aerodynamic optimization
NASA Technical Reports Server (NTRS)
Davis, W. H., Jr.
1985-01-01
Features and sample applications of the transonic rational optimization (TRO-2D) code are outlined. TRO-2D includes the airfoil analysis code FLO-36, the CONMIN optimization code and a rational approach to defining aero-function shapes for geometry modification. The program is part of an effort to develop an aerodynamically smart optimizer that will simplify and shorten the design process. The user has a selection of drag minimization and associated minimum lift, moment, and the pressure distribution, a choice among 14 resident aero-function shapes, and options on aerodynamic and geometric constraints. Design variables such as the angle of attack, leading edge radius and camber, shock strength and movement, supersonic pressure plateau control, etc., are discussed. The results of calculations of a reduced leading edge camber transonic airfoil and an airfoil with a natural laminar flow are provided, showing that only four design variables need be specified to obtain satisfactory results.
Nuclear fuel management optimization using genetic algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeChaine, M.D.; Feltus, M.A.
1995-07-01
The code independent genetic algorithm reactor optimization (CIGARO) system has been developed to optimize nuclear reactor loading patterns. It uses genetic algorithms (GAs) and a code-independent interface, so any reactor physics code (e.g., CASMO-3/SIMULATE-3) can be used to evaluate the loading patterns. The system is compared to other GA-based loading pattern optimizers. Tests were carried out to maximize the beginning of cycle k{sub eff} for a pressurized water reactor core loading with a penalty function to limit power peaking. The CIGARO system performed well, increasing the k{sub eff} after lowering the peak power. Tests of a prototype parallel evaluation methodmore » showed the potential for a significant speedup.« less
1981-12-01
file.library-unit{.subunit).SYMAP Statement Map: library-file. library-unit.subunit).SMAP Type Map: 1 ibrary.fi le. 1 ibrary-unit{.subunit). TMAP The library...generator SYMAP Symbol Map code generator SMAP Updated Statement Map code generator TMAP Type Map code generator A.3.5 The PUNIT Command The P UNIT...Core.Stmtmap) NAME Tmap (Core.Typemap) END Example A-3 Compiler Command Stream for the Code Generator Texas Instruments A-5 Ada Optimizing Compiler
Multi-optimization Criteria-based Robot Behavioral Adaptability and Motion Planning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pin, Francois G.
2002-06-01
Robotic tasks are typically defined in Task Space (e.g., the 3-D World), whereas robots are controlled in Joint Space (motors). The transformation from Task Space to Joint Space must consider the task objectives (e.g., high precision, strength optimization, torque optimization), the task constraints (e.g., obstacles, joint limits, non-holonomic constraints, contact or tool task constraints), and the robot kinematics configuration (e.g., tools, type of joints, mobile platform, manipulator, modular additions, locked joints). Commercially available robots are optimized for a specific set of tasks, objectives and constraints and, therefore, their control codes are extremely specific to a particular set of conditions. Thus,more » there exist a multiplicity of codes, each handling a particular set of conditions, but none suitable for use on robots with widely varying tasks, objectives, constraints, or environments. On the other hand, most DOE missions and tasks are typically ''batches of one''. Attempting to use commercial codes for such work requires significant personnel and schedule costs for re-programming or adding code to the robots whenever a change in task objective, robot configuration, number and type of constraint, etc. occurs. The objective of our project is to develop a ''generic code'' to implement this Task-space to Joint-Space transformation that would allow robot behavior adaptation, in real time (at loop rate), to changes in task objectives, number and type of constraints, modes of controls, kinematics configuration (e.g., new tools, added module). Our specific goal is to develop a single code for the general solution of under-specified systems of algebraic equations that is suitable for solving the inverse kinematics of robots, is useable for all types of robots (mobile robots, manipulators, mobile manipulators, etc.) with no limitation on the number of joints and the number of controlled Task-Space variables, can adapt to real time changes in number and type of constraints and in task objectives, and can adapt to changes in kinematics configurations (change of module, change of tool, joint failure adaptation, etc.).« less
Medical Image Compression Based on Vector Quantization with Variable Block Sizes in Wavelet Domain
Jiang, Huiyan; Ma, Zhiyuan; Hu, Yang; Yang, Benqiang; Zhang, Libo
2012-01-01
An optimized medical image compression algorithm based on wavelet transform and improved vector quantization is introduced. The goal of the proposed method is to maintain the diagnostic-related information of the medical image at a high compression ratio. Wavelet transformation was first applied to the image. For the lowest-frequency subband of wavelet coefficients, a lossless compression method was exploited; for each of the high-frequency subbands, an optimized vector quantization with variable block size was implemented. In the novel vector quantization method, local fractal dimension (LFD) was used to analyze the local complexity of each wavelet coefficients, subband. Then an optimal quadtree method was employed to partition each wavelet coefficients, subband into several sizes of subblocks. After that, a modified K-means approach which is based on energy function was used in the codebook training phase. At last, vector quantization coding was implemented in different types of sub-blocks. In order to verify the effectiveness of the proposed algorithm, JPEG, JPEG2000, and fractal coding approach were chosen as contrast algorithms. Experimental results show that the proposed method can improve the compression performance and can achieve a balance between the compression ratio and the image visual quality. PMID:23049544
Medical image compression based on vector quantization with variable block sizes in wavelet domain.
Jiang, Huiyan; Ma, Zhiyuan; Hu, Yang; Yang, Benqiang; Zhang, Libo
2012-01-01
An optimized medical image compression algorithm based on wavelet transform and improved vector quantization is introduced. The goal of the proposed method is to maintain the diagnostic-related information of the medical image at a high compression ratio. Wavelet transformation was first applied to the image. For the lowest-frequency subband of wavelet coefficients, a lossless compression method was exploited; for each of the high-frequency subbands, an optimized vector quantization with variable block size was implemented. In the novel vector quantization method, local fractal dimension (LFD) was used to analyze the local complexity of each wavelet coefficients, subband. Then an optimal quadtree method was employed to partition each wavelet coefficients, subband into several sizes of subblocks. After that, a modified K-means approach which is based on energy function was used in the codebook training phase. At last, vector quantization coding was implemented in different types of sub-blocks. In order to verify the effectiveness of the proposed algorithm, JPEG, JPEG2000, and fractal coding approach were chosen as contrast algorithms. Experimental results show that the proposed method can improve the compression performance and can achieve a balance between the compression ratio and the image visual quality.
NASA Astrophysics Data System (ADS)
Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret
2003-12-01
A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Heuristic rules embedded genetic algorithm for in-core fuel management optimization
NASA Astrophysics Data System (ADS)
Alim, Fatih
The objective of this study was to develop a unique methodology and a practical tool for designing loading pattern (LP) and burnable poison (BP) pattern for a given Pressurized Water Reactor (PWR) core. Because of the large number of possible combinations for the fuel assembly (FA) loading in the core, the design of the core configuration is a complex optimization problem. It requires finding an optimal FA arrangement and BP placement in order to achieve maximum cycle length while satisfying the safety constraints. Genetic Algorithms (GA) have been already used to solve this problem for LP optimization for both PWR and Boiling Water Reactor (BWR). The GA, which is a stochastic method works with a group of solutions and uses random variables to make decisions. Based on the theories of evaluation, the GA involves natural selection and reproduction of the individuals in the population for the next generation. The GA works by creating an initial population, evaluating it, and then improving the population by using the evaluation operators. To solve this optimization problem, a LP optimization package, GARCO (Genetic Algorithm Reactor Code Optimization) code is developed in the framework of this thesis. This code is applicable for all types of PWR cores having different geometries and structures with an unlimited number of FA types in the inventory. To reach this goal, an innovative GA is developed by modifying the classical representation of the genotype. To obtain the best result in a shorter time, not only the representation is changed but also the algorithm is changed to use in-core fuel management heuristics rules. The improved GA code was tested to demonstrate and verify the advantages of the new enhancements. The developed methodology is explained in this thesis and preliminary results are shown for the VVER-1000 reactor hexagonal geometry core and the TMI-1 PWR. The improved GA code was tested to verify the advantages of new enhancements. The core physics code used for VVER in this research is Moby-Dick, which was developed to analyze the VVER by SKODA Inc. The SIMULATE-3 code, which is an advanced two-group nodal code, is used to analyze the TMI-1.
Optimized scalar promotion with load and splat SIMD instructions
Eichenberger, Alexander E; Gschwind, Michael K; Gunnels, John A
2013-10-29
Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
Optimized scalar promotion with load and splat SIMD instructions
Eichenberger, Alexandre E [Chappaqua, NY; Gschwind, Michael K [Chappaqua, NY; Gunnels, John A [Yorktown Heights, NY
2012-08-28
Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
Py4CAtS - Python tools for line-by-line modelling of infrared atmospheric radiative transfer
NASA Astrophysics Data System (ADS)
Schreier, Franz; García, Sebastián Gimeno
2013-05-01
Py4CAtS — Python scripts for Computational ATmospheric Spectroscopy is a Python re-implementation of the Fortran infrared radiative transfer code GARLIC, where compute-intensive code sections utilize the Numeric/Scientific Python modules for highly optimized array-processing. The individual steps of an infrared or microwave radiative transfer computation are implemented in separate scripts to extract lines of relevant molecules in the spectral range of interest, to compute line-by-line cross sections for given pressure(s) and temperature(s), to combine cross sections to absorption coefficients and optical depths, and to integrate along the line-of-sight to transmission and radiance/intensity. The basic design of the package, numerical and computational aspects relevant for optimization, and a sketch of the typical workflow are presented.
Applications of Derandomization Theory in Coding
NASA Astrophysics Data System (ADS)
Cheraghchi, Mahdi
2011-07-01
Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes. [This is a shortened version of the actual abstract in the thesis.
Trajectories for High Specific Impulse High Specific Power Deep Space Exploration
NASA Technical Reports Server (NTRS)
Polsgrove, T.; Adams, R. B.; Brady, Hugh J. (Technical Monitor)
2002-01-01
Preliminary results are presented for two methods to approximate the mission performance of high specific impulse high specific power vehicles. The first method is based on an analytical approximation derived by Williams and Shepherd and can be used to approximate mission performance to outer planets and interstellar space. The second method is based on a parametric analysis of trajectories created using the well known trajectory optimization code, VARITOP. This parametric analysis allows the reader to approximate payload ratios and optimal power requirements for both one-way and round-trip missions. While this second method only addresses missions to and from Jupiter, future work will encompass all of the outer planet destinations and some interstellar precursor missions.
Spiral: Automated Computing for Linear Transforms
NASA Astrophysics Data System (ADS)
Püschel, Markus
2010-09-01
Writing fast software has become extraordinarily difficult. For optimal performance, programs and their underlying algorithms have to be adapted to take full advantage of the platform's parallelism, memory hierarchy, and available instruction set. To make things worse, the best implementations are often platform-dependent and platforms are constantly evolving, which quickly renders libraries obsolete. We present Spiral, a domain-specific program generation system for important functionality used in signal processing and communication including linear transforms, filters, and other functions. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and intelligently searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, declarative, domain-specific framework to represent algorithms and the use of rewriting systems to generate and optimize algorithms at a high level of abstraction. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.
Optimal block cosine transform image coding for noisy channels
NASA Technical Reports Server (NTRS)
Vaishampayan, V.; Farvardin, N.
1986-01-01
The two dimensional block transform coding scheme based on the discrete cosine transform was studied extensively for image coding applications. While this scheme has proven to be efficient in the absence of channel errors, its performance degrades rapidly over noisy channels. A method is presented for the joint source channel coding optimization of a scheme based on the 2-D block cosine transform when the output of the encoder is to be transmitted via a memoryless design of the quantizers used for encoding the transform coefficients. This algorithm produces a set of locally optimum quantizers and the corresponding binary code assignment for the assumed transform coefficient statistics. To determine the optimum bit assignment among the transform coefficients, an algorithm was used based on the steepest descent method, which under certain convexity conditions on the performance of the channel optimized quantizers, yields the optimal bit allocation. Comprehensive simulation results for the performance of this locally optimum system over noisy channels were obtained and appropriate comparisons against a reference system designed for no channel error were rendered.
Numerical optimization of three-dimensional coils for NSTX-U
Lazerson, S. A.; Park, J. -K.; Logan, N.; ...
2015-09-03
A tool for the calculation of optimal three-dimensional (3D) perturbative magnetic fields in tokamaks has been developed. The IPECOPT code builds upon the stellarator optimization code STELLOPT to allow for optimization of linear ideal magnetohydrodynamic perturbed equilibrium (IPEC). This tool has been applied to NSTX-U equilibria, addressing which fields are the most effective at driving NTV torques. The NTV torque calculation is performed by the PENT code. Optimization of the normal field spectrum shows that fields with n = 1 character can drive a large core torque. It is also shown that fields with n = 3 features are capablemore » of driving edge torque and some core torque. Coil current optimization (using the planned in-vessel and existing RWM coils) on NSTX-U suggest the planned coils set is adequate for core and edge torque control. In conclusion, comparison between error field correction experiments on DIII-D and the optimizer show good agreement.« less
Optimal periodic binary codes of lengths 28 to 64
NASA Technical Reports Server (NTRS)
Tyler, S.; Keston, R.
1980-01-01
Results from computer searches performed to find repeated binary phase coded waveforms with optimal periodic autocorrelation functions are discussed. The best results for lengths 28 to 64 are given. The code features of major concern are where (1) the peak sidelobe in the autocorrelation function is small and (2) the sum of the squares of the sidelobes in the autocorrelation function is small.
Two Classes of New Optimal Asymmetric Quantum Codes
NASA Astrophysics Data System (ADS)
Chen, Xiaojing; Zhu, Shixin; Kai, Xiaoshan
2018-03-01
Let q be an even prime power and ω be a primitive element of F_{q2}. By analyzing the structure of cyclotomic cosets, we determine a sufficient condition for ω q- 1-constacyclic codes over F_{q2} to be Hermitian dual-containing codes. By the CSS construction, two classes of new optimal AQECCs are obtained according to the Singleton bound for AQECCs.
A Subsonic Aircraft Design Optimization With Neural Network and Regression Approximators
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.; Haller, William J.
2004-01-01
The Flight-Optimization-System (FLOPS) code encountered difficulty in analyzing a subsonic aircraft. The limitation made the design optimization problematic. The deficiencies have been alleviated through use of neural network and regression approximations. The insight gained from using the approximators is discussed in this paper. The FLOPS code is reviewed. Analysis models are developed and validated for each approximator. The regression method appears to hug the data points, while the neural network approximation follows a mean path. For an analysis cycle, the approximate model required milliseconds of central processing unit (CPU) time versus seconds by the FLOPS code. Performance of the approximators was satisfactory for aircraft analysis. A design optimization capability has been created by coupling the derived analyzers to the optimization test bed CometBoards. The approximators were efficient reanalysis tools in the aircraft design optimization. Instability encountered in the FLOPS analyzer was eliminated. The convergence characteristics were improved for the design optimization. The CPU time required to calculate the optimum solution, measured in hours with the FLOPS code was reduced to minutes with the neural network approximation and to seconds with the regression method. Generation of the approximators required the manipulation of a very large quantity of data. Design sensitivity with respect to the bounds of aircraft constraints is easily generated.
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.
2000-01-01
The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.
Energy efficient rateless codes for high speed data transfer over free space optical channels
NASA Astrophysics Data System (ADS)
Prakash, Geetha; Kulkarni, Muralidhar; Acharya, U. S.
2015-03-01
Terrestrial Free Space Optical (FSO) links transmit information by using the atmosphere (free space) as a medium. In this paper, we have investigated the use of Luby Transform (LT) codes as a means to mitigate the effects of data corruption induced by imperfect channel which usually takes the form of lost or corrupted packets. LT codes, which are a class of Fountain codes, can be used independent of the channel rate and as many code words as required can be generated to recover all the message bits irrespective of the channel performance. Achieving error free high data rates with limited energy resources is possible with FSO systems if error correction codes with minimal overheads on the power can be used. We also employ a combination of Binary Phase Shift Keying (BPSK) with provision for modification of threshold and optimized LT codes with belief propagation for decoding. These techniques provide additional protection even under strong turbulence regimes. Automatic Repeat Request (ARQ) is another method of improving link reliability. Performance of ARQ is limited by the number of retransmissions and the corresponding time delay. We prove through theoretical computations and simulations that LT codes consume less energy per bit. We validate the feasibility of using energy efficient LT codes over ARQ for FSO links to be used in optical wireless sensor networks within the eye safety limits.
Subband Image Coding with Jointly Optimized Quantizers
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith Mark J. T.
1995-01-01
An iterative design algorithm for the joint design of complexity- and entropy-constrained subband quantizers and associated entropy coders is proposed. Unlike conventional subband design algorithms, the proposed algorithm does not require the use of various bit allocation algorithms. Multistage residual quantizers are employed here because they provide greater control of the complexity-performance tradeoffs, and also because they allow efficient and effective high-order statistical modeling. The resulting subband coder exploits statistical dependencies within subbands, across subbands, and across stages, mainly through complexity-constrained high-order entropy coding. Experimental results demonstrate that the complexity-rate-distortion performance of the new subband coder is exceptional.
NASA Astrophysics Data System (ADS)
Vesselinov, V. V.; Harp, D.
2010-12-01
The process of decision making to protect groundwater resources requires a detailed estimation of uncertainties in model predictions. Various uncertainties associated with modeling a natural system, such as: (1) measurement and computational errors; (2) uncertainties in the conceptual model and model-parameter estimates; (3) simplifications in model setup and numerical representation of governing processes, contribute to the uncertainties in the model predictions. Due to this combination of factors, the sources of predictive uncertainties are generally difficult to quantify individually. Decision support related to optimal design of monitoring networks requires (1) detailed analyses of existing uncertainties related to model predictions of groundwater flow and contaminant transport, (2) optimization of the proposed monitoring network locations in terms of their efficiency to detect contaminants and provide early warning. We apply existing and newly-proposed methods to quantify predictive uncertainties and to optimize well locations. An important aspect of the analysis is the application of newly-developed optimization technique based on coupling of Particle Swarm and Levenberg-Marquardt optimization methods which proved to be robust and computationally efficient. These techniques and algorithms are bundled in a software package called MADS. MADS (Model Analyses for Decision Support) is an object-oriented code that is capable of performing various types of model analyses and supporting model-based decision making. The code can be executed under different computational modes, which include (1) sensitivity analyses (global and local), (2) Monte Carlo analysis, (3) model calibration, (4) parameter estimation, (5) uncertainty quantification, and (6) model selection. The code can be externally coupled with any existing model simulator through integrated modules that read/write input and output files using a set of template and instruction files (consistent with the PEST I/O protocol). MADS can also be internally coupled with a series of built-in analytical simulators. MADS provides functionality to work directly with existing control files developed for the code PEST (Doherty 2009). To perform the computational modes mentioned above, the code utilizes (1) advanced Latin-Hypercube sampling techniques (including Improved Distributed Sampling), (2) various gradient-based Levenberg-Marquardt optimization methods, (3) advanced global optimization methods (including Particle Swarm Optimization), and (4) a selection of alternative objective functions. The code has been successfully applied to perform various model analyses related to environmental management of real contamination sites. Examples include source identification problems, quantification of uncertainty, model calibration, and optimization of monitoring networks. The methodology and software codes are demonstrated using synthetic and real case studies where monitoring networks are optimized taking into account the uncertainty in model predictions of contaminant transport.
Wing Weight Optimization Under Aeroelastic Loads Subject to Stress Constraints
NASA Technical Reports Server (NTRS)
Kapania, Rakesh K.; Issac, J.; Macmurdy, D.; Guruswamy, Guru P.
1997-01-01
A minimum weight optimization of the wing under aeroelastic loads subject to stress constraints is carried out. The loads for the optimization are based on aeroelastic trim. The design variables are the thickness of the wing skins and planform variables. The composite plate structural model incorporates first-order shear deformation theory, the wing deflections are expressed using Chebyshev polynomials and a Rayleigh-Ritz procedure is adopted for the structural formulation. The aerodynamic pressures provided by the aerodynamic code at a discrete number of grid points is represented as a bilinear distribution on the composite plate code to solve for the deflections and stresses in the wing. The lifting-surface aerodynamic code FAST is presently being used to generate the pressure distribution over the wing. The envisioned ENSAERO/Plate is an aeroelastic analysis code which combines ENSAERO version 3.0 (for analysis of wing-body configurations) with the composite plate code.
Intercomparison of three microwave/infrared high resolution line-by-line radiative transfer codes
NASA Astrophysics Data System (ADS)
Schreier, Franz; Milz, Mathias; Buehler, Stefan A.; von Clarmann, Thomas
2018-05-01
An intercomparison of three line-by-line (lbl) codes developed independently for atmospheric radiative transfer and remote sensing - ARTS, GARLIC, and KOPRA - has been performed for a thermal infrared nadir sounding application assuming a HIRS-like (High resolution Infrared Radiation Sounder) setup. Radiances for the 19 HIRS infrared channels and a set of 42 atmospheric profiles from the "Garand dataset" have been computed. The mutual differences of the equivalent brightness temperatures are presented and possible causes of disagreement are discussed. In particular, the impact of path integration schemes and atmospheric layer discretization is assessed. When the continuum absorption contribution is ignored because of the different implementations, residuals are generally in the sub-Kelvin range and smaller than 0.1 K for some window channels (and all atmospheric models and lbl codes). None of the three codes turned out to be perfect for all channels and atmospheres. Remaining discrepancies are attributed to different lbl optimization techniques. Lbl codes seem to have reached a maturity in the implementation of radiative transfer that the choice of the underlying physical models (line shape models, continua etc) becomes increasingly relevant.
Multicore-based 3D-DWT video encoder
NASA Astrophysics Data System (ADS)
Galiano, Vicente; López-Granado, Otoniel; Malumbres, Manuel P.; Migallón, Hector
2013-12-01
Three-dimensional wavelet transform (3D-DWT) encoders are good candidates for applications like professional video editing, video surveillance, multi-spectral satellite imaging, etc. where a frame must be reconstructed as quickly as possible. In this paper, we present a new 3D-DWT video encoder based on a fast run-length coding engine. Furthermore, we present several multicore optimizations to speed-up the 3D-DWT computation. An exhaustive evaluation of the proposed encoder (3D-GOP-RL) has been performed, and we have compared the evaluation results with other video encoders in terms of rate/distortion (R/D), coding/decoding delay, and memory consumption. Results show that the proposed encoder obtains good R/D results for high-resolution video sequences with nearly in-place computation using only the memory needed to store a group of pictures. After applying the multicore optimization strategies over the 3D DWT, the proposed encoder is able to compress a full high-definition video sequence in real-time.
Optimized iterative decoding method for TPC coded CPM
NASA Astrophysics Data System (ADS)
Ma, Yanmin; Lai, Penghui; Wang, Shilian; Xie, Shunqin; Zhang, Wei
2018-05-01
Turbo Product Code (TPC) coded Continuous Phase Modulation (CPM) system (TPC-CPM) has been widely used in aeronautical telemetry and satellite communication. This paper mainly investigates the improvement and optimization on the TPC-CPM system. We first add the interleaver and deinterleaver to the TPC-CPM system, and then establish an iterative system to iteratively decode. However, the improved system has a poor convergence ability. To overcome this issue, we use the Extrinsic Information Transfer (EXIT) analysis to find the optimal factors for the system. The experiments show our method is efficient to improve the convergence performance.
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; ...
2017-03-20
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen
2014-05-01
The Weather Research and Forecasting (WRF) model is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the Total Energy - Mass Flux (TEMF) that unifies turbulence and moist convection components produces a better result that the other PBL schemes. For that reason, the TEMF scheme is chosen as the PBL scheme we optimized for Intel Many Integrated Core (MIC), which ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our optimization results for TEMF planetary boundary layer scheme. The optimizations that were performed were quite generic in nature. Those optimizations included vectorization of the code to utilize vector units inside each CPU. Furthermore, memory access was improved by scalarizing some of the intermediate arrays. The results show that the optimization improved MIC performance by 14.8x. Furthermore, the optimizations increased CPU performance by 2.6x compared to the original multi-threaded code on quad core Intel Xeon E5-2603 running at 1.8 GHz. Compared to the optimized code running on a single CPU socket the optimized MIC code is 6.2x faster.
Advanced helium purge seals for Liquid Oxygen (LOX) turbopumps
NASA Technical Reports Server (NTRS)
Shapiro, Wilbur; Lee, Chester C.
1989-01-01
Program objectives were to determine three advanced configurations of helium buffer seals capable of providing improved performance in a space shuttle main engine (SSME), high-pressure liquid oxygen (LOX) turbopump environment, and to provide NASA with the analytical tools to determine performance of a variety of seal configurations. The three seal designs included solid-ring fluid-film seals often referred to as floating ring seals, back-to-back fluid-film face seals, and a circumferential sectored seal that incorporated inherent clearance adjustment capabilities. Of the three seals designed, the sectored seal is favored because the self-adjusting clearance features accommodate the variations in clearance that will occur because of thermal and centrifugal distortions without compromising performance. Moreover, leakage can be contained well below the maximum target values; minimizing leakage is important on the SSME since helium is provided by an external tank. A reduction in tank size translates to an increase in payload that can be carried on board the shuttle. The computer codes supplied under this program included a code for analyzing a variety of gas-lubricated, floating ring, and sector seals; a code for analyzing gas-lubricated face seals; a code for optimizing and analyzing gas-lubricated spiral-groove face seals; and a code for determining fluid-film face seal response to runner excitations in as many as five degrees of freedom. These codes proved invaluable for optimizing designs and estimating final performance of the seals described.
Yu, Lianchun; Shen, Zhou; Wang, Chen; Yu, Yuguo
2018-01-01
Selective pressure may drive neural systems to process as much information as possible with the lowest energy cost. Recent experiment evidence revealed that the ratio between synaptic excitation and inhibition (E/I) in local cortex is generally maintained at a certain value which may influence the efficiency of energy consumption and information transmission of neural networks. To understand this issue deeply, we constructed a typical recurrent Hodgkin-Huxley network model and studied the general principles that governs the relationship among the E/I synaptic current ratio, the energy cost and total amount of information transmission. We observed in such a network that there exists an optimal E/I synaptic current ratio in the network by which the information transmission achieves the maximum with relatively low energy cost. The coding energy efficiency which is defined as the mutual information divided by the energy cost, achieved the maximum with the balanced synaptic current. Although background noise degrades information transmission and imposes an additional energy cost, we find an optimal noise intensity that yields the largest information transmission and energy efficiency at this optimal E/I synaptic transmission ratio. The maximization of energy efficiency also requires a certain part of energy cost associated with spontaneous spiking and synaptic activities. We further proved this finding with analytical solution based on the response function of bistable neurons, and demonstrated that optimal net synaptic currents are capable of maximizing both the mutual information and energy efficiency. These results revealed that the development of E/I synaptic current balance could lead a cortical network to operate at a highly efficient information transmission rate at a relatively low energy cost. The generality of neuronal models and the recurrent network configuration used here suggest that the existence of an optimal E/I cell ratio for highly efficient energy costs and information maximization is a potential principle for cortical circuit networks. Summary We conducted numerical simulations and mathematical analysis to examine the energy efficiency of neural information transmission in a recurrent network as a function of the ratio of excitatory and inhibitory synaptic connections. We obtained a general solution showing that there exists an optimal E/I synaptic ratio in a recurrent network at which the information transmission as well as the energy efficiency of this network achieves a global maximum. These results reflect general mechanisms for sensory coding processes, which may give insight into the energy efficiency of neural communication and coding. PMID:29773979
Yu, Lianchun; Shen, Zhou; Wang, Chen; Yu, Yuguo
2018-01-01
Selective pressure may drive neural systems to process as much information as possible with the lowest energy cost. Recent experiment evidence revealed that the ratio between synaptic excitation and inhibition (E/I) in local cortex is generally maintained at a certain value which may influence the efficiency of energy consumption and information transmission of neural networks. To understand this issue deeply, we constructed a typical recurrent Hodgkin-Huxley network model and studied the general principles that governs the relationship among the E/I synaptic current ratio, the energy cost and total amount of information transmission. We observed in such a network that there exists an optimal E/I synaptic current ratio in the network by which the information transmission achieves the maximum with relatively low energy cost. The coding energy efficiency which is defined as the mutual information divided by the energy cost, achieved the maximum with the balanced synaptic current. Although background noise degrades information transmission and imposes an additional energy cost, we find an optimal noise intensity that yields the largest information transmission and energy efficiency at this optimal E/I synaptic transmission ratio. The maximization of energy efficiency also requires a certain part of energy cost associated with spontaneous spiking and synaptic activities. We further proved this finding with analytical solution based on the response function of bistable neurons, and demonstrated that optimal net synaptic currents are capable of maximizing both the mutual information and energy efficiency. These results revealed that the development of E/I synaptic current balance could lead a cortical network to operate at a highly efficient information transmission rate at a relatively low energy cost. The generality of neuronal models and the recurrent network configuration used here suggest that the existence of an optimal E/I cell ratio for highly efficient energy costs and information maximization is a potential principle for cortical circuit networks. We conducted numerical simulations and mathematical analysis to examine the energy efficiency of neural information transmission in a recurrent network as a function of the ratio of excitatory and inhibitory synaptic connections. We obtained a general solution showing that there exists an optimal E/I synaptic ratio in a recurrent network at which the information transmission as well as the energy efficiency of this network achieves a global maximum. These results reflect general mechanisms for sensory coding processes, which may give insight into the energy efficiency of neural communication and coding.
NASA Technical Reports Server (NTRS)
Reichel, R. H.; Hague, D. S.; Jones, R. T.; Glatt, C. R.
1973-01-01
This computer program manual describes in two parts the automated combustor design optimization code AUTOCOM. The program code is written in the FORTRAN 4 language. The input data setup and the program outputs are described, and a sample engine case is discussed. The program structure and programming techniques are also described, along with AUTOCOM program analysis.
NASA Astrophysics Data System (ADS)
Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong
2017-07-01
The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
Self-adaptive multimethod optimization applied to a tailored heating forging process
NASA Astrophysics Data System (ADS)
Baldan, M.; Steinberg, T.; Baake, E.
2018-05-01
The presented paper describes an innovative self-adaptive multi-objective optimization code. Investigation goals concern proving the superiority of this code compared to NGSA-II and applying it to an inductor’s design case study addressed to a “tailored” heating forging application. The choice of the frequency and the heating time are followed by the determination of the turns number and their positions. Finally, a straightforward optimization is performed in order to minimize energy consumption using “optimal control”.
Evaluation of CFETR as a Fusion Nuclear Science Facility using multiple system codes
NASA Astrophysics Data System (ADS)
Chan, V. S.; Costley, A. E.; Wan, B. N.; Garofalo, A. M.; Leuer, J. A.
2015-02-01
This paper presents the results of a multi-system codes benchmarking study of the recently published China Fusion Engineering Test Reactor (CFETR) pre-conceptual design (Wan et al 2014 IEEE Trans. Plasma Sci. 42 495). Two system codes, General Atomics System Code (GASC) and Tokamak Energy System Code (TESC), using different methodologies to arrive at CFETR performance parameters under the same CFETR constraints show that the correlation between the physics performance and the fusion performance is consistent, and the computed parameters are in good agreement. Optimization of the first wall surface for tritium breeding and the minimization of the machine size are highly compatible. Variations of the plasma currents and profiles lead to changes in the required normalized physics performance, however, they do not significantly affect the optimized size of the machine. GASC and TESC have also been used to explore a lower aspect ratio, larger volume plasma taking advantage of the engineering flexibility in the CFETR design. Assuming the ITER steady-state scenario physics, the larger plasma together with a moderately higher BT and Ip can result in a high gain Qfus ˜ 12, Pfus ˜ 1 GW machine approaching DEMO-like performance. It is concluded that the CFETR baseline mode can meet the minimum goal of the Fusion Nuclear Science Facility (FNSF) mission and advanced physics will enable it to address comprehensively the outstanding critical technology gaps on the path to a demonstration reactor (DEMO). Before proceeding with CFETR construction steady-state operation has to be demonstrated, further development is needed to solve the divertor heat load issue, and blankets have to be designed with tritium breeding ratio (TBR) >1 as a target.
Overview: Applications of numerical optimization methods to helicopter design problems
NASA Technical Reports Server (NTRS)
Miura, H.
1984-01-01
There are a number of helicopter design problems that are well suited to applications of numerical design optimization techniques. Adequate implementation of this technology will provide high pay-offs. There are a number of numerical optimization programs available, and there are many excellent response/performance analysis programs developed or being developed. But integration of these programs in a form that is usable in the design phase should be recognized as important. It is also necessary to attract the attention of engineers engaged in the development of analysis capabilities and to make them aware that analysis capabilities are much more powerful if integrated into design oriented codes. Frequently, the shortcoming of analysis capabilities are revealed by coupling them with an optimization code. Most of the published work has addressed problems in preliminary system design, rotor system/blade design or airframe design. Very few published results were found in acoustics, aerodynamics and control system design. Currently major efforts are focused on vibration reduction, and aerodynamics/acoustics applications appear to be growing fast. The development of a computer program system to integrate the multiple disciplines required in helicopter design with numerical optimization technique is needed. Activities in Britain, Germany and Poland are identified, but no published results from France, Italy, the USSR or Japan were found.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Super-linear Precision in Simple Neural Population Codes
NASA Astrophysics Data System (ADS)
Schwab, David; Fiete, Ila
2015-03-01
A widely used tool for quantifying the precision with which a population of noisy sensory neurons encodes the value of an external stimulus is the Fisher Information (FI). Maximizing the FI is also a commonly used objective for constructing optimal neural codes. The primary utility and importance of the FI arises because it gives, through the Cramer-Rao bound, the smallest mean-squared error achievable by any unbiased stimulus estimator. However, it is well-known that when neural firing is sparse, optimizing the FI can result in codes that perform very poorly when considering the resulting mean-squared error, a measure with direct biological relevance. Here we construct optimal population codes by minimizing mean-squared error directly and study the scaling properties of the resulting network, focusing on the optimal tuning curve width. We then extend our results to continuous attractor networks that maintain short-term memory of external stimuli in their dynamics. Here we find similar scaling properties in the structure of the interactions that minimize diffusive information loss.
Multidisciplinary Aerospace Systems Optimization: Computational AeroSciences (CAS) Project
NASA Technical Reports Server (NTRS)
Kodiyalam, S.; Sobieski, Jaroslaw S. (Technical Monitor)
2001-01-01
The report describes a method for performing optimization of a system whose analysis is so expensive that it is impractical to let the optimization code invoke it directly because excessive computational cost and elapsed time might result. In such situation it is imperative to have user control the number of times the analysis is invoked. The reported method achieves that by two techniques in the Design of Experiment category: a uniform dispersal of the trial design points over a n-dimensional hypersphere and a response surface fitting, and the technique of krigging. Analyses of all the trial designs whose number may be set by the user are performed before activation of the optimization code and the results are stored as a data base. That code is then executed and referred to the above data base. Two applications, one of the airborne laser system, and one of an aircraft optimization illustrate the method application.
Constellation labeling optimization for bit-interleaved coded APSK
NASA Astrophysics Data System (ADS)
Xiang, Xingyu; Mo, Zijian; Wang, Zhonghai; Pham, Khanh; Blasch, Erik; Chen, Genshe
2016-05-01
This paper investigates the constellation and mapping optimization for amplitude phase shift keying (APSK) modulation, which is deployed in Digital Video Broadcasting Satellite - Second Generation (DVB-S2) and Digital Video Broadcasting - Satellite services to Handhelds (DVB-SH) broadcasting standards due to its merits of power and spectral efficiency together with the robustness against nonlinear distortion. The mapping optimization is performed for 32-APSK according to combined cost functions related to Euclidean distance and mutual information. A Binary switching algorithm and its modified version are used to minimize the cost function and the estimated error between the original and received data. The optimized constellation mapping is tested by combining DVB-S2 standard Low-Density Parity-Check (LDPC) codes in both Bit-Interleaved Coded Modulation (BICM) and BICM with iterative decoding (BICM-ID) systems. The simulated results validate the proposed constellation labeling optimization scheme which yields better performance against conventional 32-APSK constellation defined in DVB-S2 standard.
A new Bayesian Earthquake Analysis Tool (BEAT)
NASA Astrophysics Data System (ADS)
Vasyura-Bathke, Hannes; Dutta, Rishabh; Jónsson, Sigurjón; Mai, Martin
2017-04-01
Modern earthquake source estimation studies increasingly use non-linear optimization strategies to estimate kinematic rupture parameters, often considering geodetic and seismic data jointly. However, the optimization process is complex and consists of several steps that need to be followed in the earthquake parameter estimation procedure. These include pre-describing or modeling the fault geometry, calculating the Green's Functions (often assuming a layered elastic half-space), and estimating the distributed final slip and possibly other kinematic source parameters. Recently, Bayesian inference has become popular for estimating posterior distributions of earthquake source model parameters given measured/estimated/assumed data and model uncertainties. For instance, some research groups consider uncertainties of the layered medium and propagate these to the source parameter uncertainties. Other groups make use of informative priors to reduce the model parameter space. In addition, innovative sampling algorithms have been developed that efficiently explore the often high-dimensional parameter spaces. Compared to earlier studies, these improvements have resulted in overall more robust source model parameter estimates that include uncertainties. However, the computational demands of these methods are high and estimation codes are rarely distributed along with the published results. Even if codes are made available, it is often difficult to assemble them into a single optimization framework as they are typically coded in different programing languages. Therefore, further progress and future applications of these methods/codes are hampered, while reproducibility and validation of results has become essentially impossible. In the spirit of providing open-access and modular codes to facilitate progress and reproducible research in earthquake source estimations, we undertook the effort of producing BEAT, a python package that comprises all the above-mentioned features in one single programing environment. The package is build on top of the pyrocko seismological toolbox (www.pyrocko.org) and makes use of the pymc3 module for Bayesian statistical model fitting. BEAT is an open-source package (https://github.com/hvasbath/beat) and we encourage and solicit contributions to the project. In this contribution, we present our strategy for developing BEAT, show application examples, and discuss future developments.
Enhanced Multiobjective Optimization Technique for Comprehensive Aerospace Design. Part A
NASA Technical Reports Server (NTRS)
Chattopadhyay, Aditi; Rajadas, John N.
1997-01-01
A multidisciplinary design optimization procedure which couples formal multiobjectives based techniques and complex analysis procedures (such as computational fluid dynamics (CFD) codes) developed. The procedure has been demonstrated on a specific high speed flow application involving aerodynamics and acoustics (sonic boom minimization). In order to account for multiple design objectives arising from complex performance requirements, multiobjective formulation techniques are used to formulate the optimization problem. Techniques to enhance the existing Kreisselmeier-Steinhauser (K-S) function multiobjective formulation approach have been developed. The K-S function procedure used in the proposed work transforms a constrained multiple objective functions problem into an unconstrained problem which then is solved using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Weight factors are introduced during the transformation process to each objective function. This enhanced procedure will provide the designer the capability to emphasize specific design objectives during the optimization process. The demonstration of the procedure utilizes a computational Fluid dynamics (CFD) code which solves the three-dimensional parabolized Navier-Stokes (PNS) equations for the flow field along with an appropriate sonic boom evaluation procedure thus introducing both aerodynamic performance as well as sonic boom as the design objectives to be optimized simultaneously. Sensitivity analysis is performed using a discrete differentiation approach. An approximation technique has been used within the optimizer to improve the overall computational efficiency of the procedure in order to make it suitable for design applications in an industrial setting.
Optimizing a liquid propellant rocket engine with an automated combustor design code (AUTOCOM)
NASA Technical Reports Server (NTRS)
Hague, D. S.; Reichel, R. H.; Jones, R. T.; Glatt, C. R.
1972-01-01
A procedure for automatically designing a liquid propellant rocket engine combustion chamber in an optimal fashion is outlined. The procedure is contained in a digital computer code, AUTOCOM. The code is applied to an existing engine, and design modifications are generated which provide a substantial potential payload improvement over the existing design. Computer time requirements for this payload improvement were small, approximately four minutes in the CDC 6600 computer.
Wind Farm Turbine Type and Placement Optimization
NASA Astrophysics Data System (ADS)
Graf, Peter; Dykes, Katherine; Scott, George; Fields, Jason; Lunacek, Monte; Quick, Julian; Rethore, Pierre-Elouan
2016-09-01
The layout of turbines in a wind farm is already a challenging nonlinear, nonconvex, nonlinearly constrained continuous global optimization problem. Here we begin to address the next generation of wind farm optimization problems by adding the complexity that there is more than one turbine type to choose from. The optimization becomes a nonlinear constrained mixed integer problem, which is a very difficult class of problems to solve. This document briefly summarizes the algorithm and code we have developed, the code validation steps we have performed, and the initial results for multi-turbine type and placement optimization (TTP_OPT) we have run.
Wind farm turbine type and placement optimization
Graf, Peter; Dykes, Katherine; Scott, George; ...
2016-10-03
The layout of turbines in a wind farm is already a challenging nonlinear, nonconvex, nonlinearly constrained continuous global optimization problem. Here we begin to address the next generation of wind farm optimization problems by adding the complexity that there is more than one turbine type to choose from. The optimization becomes a nonlinear constrained mixed integer problem, which is a very difficult class of problems to solve. Furthermore, this document briefly summarizes the algorithm and code we have developed, the code validation steps we have performed, and the initial results for multi-turbine type and placement optimization (TTP_OPT) we have run.
Optimal wavelets for biomedical signal compression.
Nielsen, Mogens; Kamavuako, Ernest Nlandu; Andersen, Michael Midtgaard; Lucas, Marie-Françoise; Farina, Dario
2006-07-01
Signal compression is gaining importance in biomedical engineering due to the potential applications in telemedicine. In this work, we propose a novel scheme of signal compression based on signal-dependent wavelets. To adapt the mother wavelet to the signal for the purpose of compression, it is necessary to define (1) a family of wavelets that depend on a set of parameters and (2) a quality criterion for wavelet selection (i.e., wavelet parameter optimization). We propose the use of an unconstrained parameterization of the wavelet for wavelet optimization. A natural performance criterion for compression is the minimization of the signal distortion rate given the desired compression rate. For coding the wavelet coefficients, we adopted the embedded zerotree wavelet coding algorithm, although any coding scheme may be used with the proposed wavelet optimization. As a representative example of application, the coding/encoding scheme was applied to surface electromyographic signals recorded from ten subjects. The distortion rate strongly depended on the mother wavelet (for example, for 50% compression rate, optimal wavelet, mean+/-SD, 5.46+/-1.01%; worst wavelet 12.76+/-2.73%). Thus, optimization significantly improved performance with respect to previous approaches based on classic wavelets. The algorithm can be applied to any signal type since the optimal wavelet is selected on a signal-by-signal basis. Examples of application to ECG and EEG signals are also reported.
NASA Astrophysics Data System (ADS)
Alameda, J. C.
2011-12-01
Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into computational science and engineering codes. Finally, we are partnering with the lead PTP developers at IBM, to ensure we are as effective as possible within the Eclipse community development. We are also conducting training and outreach to our user community, including conference BOF sessions, monthly user calls, and an annual user meeting, so that we can best inform the improvements we make to Eclipse PTP. With these activities we endeavor to encourage use of modern software engineering practices, as enabled through the Eclipse IDE, with computational science and engineering applications. These practices include proper use of source code repositories, tracking and rectifying issues, measuring and monitoring code performance changes against both optimizations as well as ever-changing software stacks and configurations on HPC systems, as well as ultimately encouraging development and maintenance of testing suites -- things that have become commonplace in many software endeavors, but have lagged in the development of science applications. We view that the challenge with the increased complexity of both HPC systems and science applications demands the use of better software engineering methods, preferably enabled by modern tools such as Eclipse PTP, to help the computational science community thrive as we evolve the HPC landscape.
NASA Astrophysics Data System (ADS)
Bezan, Scott; Shirani, Shahram
2006-12-01
To reliably transmit video over error-prone channels, the data should be both source and channel coded. When multiple channels are available for transmission, the problem extends to that of partitioning the data across these channels. The condition of transmission channels, however, varies with time. Therefore, the error protection added to the data at one instant of time may not be optimal at the next. In this paper, we propose a method for adaptively adding error correction code in a rate-distortion (RD) optimized manner using rate-compatible punctured convolutional codes to an MJPEG2000 constant rate-coded frame of video. We perform an analysis on the rate-distortion tradeoff of each of the coding units (tiles and packets) in each frame and adapt the error correction code assigned to the unit taking into account the bandwidth and error characteristics of the channels. This method is applied to both single and multiple time-varying channel environments. We compare our method with a basic protection method in which data is either not transmitted, transmitted with no protection, or transmitted with a fixed amount of protection. Simulation results show promising performance for our proposed method.
NASA Astrophysics Data System (ADS)
Yahampath, Pradeepa
2017-12-01
Consider communicating a correlated Gaussian source over a Rayleigh fading channel with no knowledge of the channel signal-to-noise ratio (CSNR) at the transmitter. In this case, a digital system cannot be optimal for a range of CSNRs. Analog transmission however is optimal at all CSNRs, if the source and channel are memoryless and bandwidth matched. This paper presents new hybrid digital-analog (HDA) systems for sources with memory and channels with bandwidth expansion, which outperform both digital-only and analog-only systems over a wide range of CSNRs. The digital part is either a predictive quantizer or a transform code, used to achieve a coding gain. Analog part uses linear encoding to transmit the quantization error which improves the performance under CSNR variations. The hybrid encoder is optimized to achieve the minimum AMMSE (average minimum mean square error) over the CSNR distribution. To this end, analytical expressions are derived for the AMMSE of asymptotically optimal systems. It is shown that the outage CSNR of the channel code and the analog-digital power allocation must be jointly optimized to achieve the minimum AMMSE. In the case of HDA predictive quantization, a simple algorithm is presented to solve the optimization problem. Experimental results are presented for both Gauss-Markov sources and speech signals.
Hard decoding algorithm for optimizing thresholds under general Markovian noise
NASA Astrophysics Data System (ADS)
Chamberland, Christopher; Wallman, Joel; Beale, Stefanie; Laflamme, Raymond
2017-04-01
Quantum error correction is instrumental in protecting quantum systems from noise in quantum computing and communication settings. Pauli channels can be efficiently simulated and threshold values for Pauli error rates under a variety of error-correcting codes have been obtained. However, realistic quantum systems can undergo noise processes that differ significantly from Pauli noise. In this paper, we present an efficient hard decoding algorithm for optimizing thresholds and lowering failure rates of an error-correcting code under general completely positive and trace-preserving (i.e., Markovian) noise. We use our hard decoding algorithm to study the performance of several error-correcting codes under various non-Pauli noise models by computing threshold values and failure rates for these codes. We compare the performance of our hard decoding algorithm to decoders optimized for depolarizing noise and show improvements in thresholds and reductions in failure rates by several orders of magnitude. Our hard decoding algorithm can also be adapted to take advantage of a code's non-Pauli transversal gates to further suppress noise. For example, we show that using the transversal gates of the 5-qubit code allows arbitrary rotations around certain axes to be perfectly corrected. Furthermore, we show that Pauli twirling can increase or decrease the threshold depending upon the code properties. Lastly, we show that even if the physical noise model differs slightly from the hypothesized noise model used to determine an optimized decoder, failure rates can still be reduced by applying our hard decoding algorithm.
Fast Algorithms for Designing Unimodular Waveform(s) With Good Correlation Properties
NASA Astrophysics Data System (ADS)
Li, Yongzhe; Vorobyov, Sergiy A.
2018-03-01
In this paper, we develop new fast and efficient algorithms for designing single/multiple unimodular waveforms/codes with good auto- and cross-correlation or weighted correlation properties, which are highly desired in radar and communication systems. The waveform design is based on the minimization of the integrated sidelobe level (ISL) and weighted ISL (WISL) of waveforms. As the corresponding optimization problems can quickly grow to large scale with increasing the code length and number of waveforms, the main issue turns to be the development of fast large-scale optimization techniques. The difficulty is also that the corresponding optimization problems are non-convex, but the required accuracy is high. Therefore, we formulate the ISL and WISL minimization problems as non-convex quartic optimization problems in frequency domain, and then simplify them into quadratic problems by utilizing the majorization-minimization technique, which is one of the basic techniques for addressing large-scale and/or non-convex optimization problems. While designing our fast algorithms, we find out and use inherent algebraic structures in the objective functions to rewrite them into quartic forms, and in the case of WISL minimization, to derive additionally an alternative quartic form which allows to apply the quartic-quadratic transformation. Our algorithms are applicable to large-scale unimodular waveform design problems as they are proved to have lower or comparable computational burden (analyzed theoretically) and faster convergence speed (confirmed by comprehensive simulations) than the state-of-the-art algorithms. In addition, the waveforms designed by our algorithms demonstrate better correlation properties compared to their counterparts.
High density arrays of micromirrors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Folta, J. M.; Decker, J. Y.; Kolman, J.
We established and achieved our goal to (1) fabricate and evaluate test structures based on the micromirror design optimized for maskless lithography applications, (2) perform system analysis and code development for the maskless lithography concept, and (3) identify specifications for micromirror arrays (MMAs) for LLNL's adaptive optics (AO) applications and conceptualize new devices.
USDA-ARS?s Scientific Manuscript database
We have previously identified the mycobacterial high G+C codon usage bias as a limiting factor in heterologous expression of MAP proteins from Lb.salivarius, and demonstrated that codon optimisation of a synthetic coding gene greatly enhances MAP protein production. Here, we effectively demonstrate ...
Solar Glazing Tips for School Construction
ERIC Educational Resources Information Center
Smith, Jonathan
2012-01-01
Glazing can be optimized to enhance passive solar heating and daylight harvesting by exceeding the prescriptive limits of the energy code. This savings can be garnered without the high cost of external overhangs or expensive glazing products. The majority of savings from solar glazing are attributable to the increase in solar heating and…
A Spherical Active Coded Aperture for 4π Gamma-ray Imaging
Hellfeld, Daniel; Barton, Paul; Gunter, Donald; ...
2017-09-22
Gamma-ray imaging facilitates the efficient detection, characterization, and localization of compact radioactive sources in cluttered environments. Fieldable detector systems employing active planar coded apertures have demonstrated broad energy sensitivity via both coded aperture and Compton imaging modalities. But, planar configurations suffer from a limited field-of-view, especially in the coded aperture mode. In order to improve upon this limitation, we introduce a novel design by rearranging the detectors into an active coded spherical configuration, resulting in a 4pi isotropic field-of-view for both coded aperture and Compton imaging. This work focuses on the low- energy coded aperture modality and the optimization techniquesmore » used to determine the optimal number and configuration of 1 cm 3 CdZnTe coplanar grid detectors on a 14 cm diameter sphere with 192 available detector locations.« less
NASA Astrophysics Data System (ADS)
Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.
2017-12-01
StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.
Lee, Bumshik; Kim, Munchurl
2016-08-01
In this paper, a low complexity coding unit (CU)-level rate and distortion estimation scheme is proposed for High Efficiency Video Coding (HEVC) hardware-friendly implementation where a Walsh-Hadamard transform (WHT)-based low-complexity integer discrete cosine transform (DCT) is employed for distortion estimation. Since HEVC adopts quadtree structures of coding blocks with hierarchical coding depths, it becomes more difficult to estimate accurate rate and distortion values without actually performing transform, quantization, inverse transform, de-quantization, and entropy coding. Furthermore, DCT for rate-distortion optimization (RDO) is computationally high, because it requires a number of multiplication and addition operations for various transform block sizes of 4-, 8-, 16-, and 32-orders and requires recursive computations to decide the optimal depths of CU or transform unit. Therefore, full RDO-based encoding is highly complex, especially for low-power implementation of HEVC encoders. In this paper, a rate and distortion estimation scheme is proposed in CU levels based on a low-complexity integer DCT that can be computed in terms of WHT whose coefficients are produced in prediction stages. For rate and distortion estimation in CU levels, two orthogonal matrices of 4×4 and 8×8 , which are applied to WHT that are newly designed in a butterfly structure only with addition and shift operations. By applying the integer DCT based on the WHT and newly designed transforms in each CU block, the texture rate can precisely be estimated after quantization using the number of non-zero quantized coefficients and the distortion can also be precisely estimated in transform domain without de-quantization and inverse transform required. In addition, a non-texture rate estimation is proposed by using a pseudoentropy code to obtain accurate total rate estimates. The proposed rate and the distortion estimation scheme can effectively be used for HW-friendly implementation of HEVC encoders with 9.8% loss over HEVC full RDO, which much less than 20.3% and 30.2% loss of a conventional approach and Hadamard-only scheme, respectively.
Integration of a CAD System Into an MDO Framework
NASA Technical Reports Server (NTRS)
Townsend, J. C.; Samareh, J. A.; Weston, R. P.; Zorumski, W. E.
1998-01-01
NASA Langley has developed a heterogeneous distributed computing environment, called the Framework for Inter-disciplinary Design Optimization, or FIDO. Its purpose has been to demonstrate framework technical feasibility and usefulness for optimizing the preliminary design of complex systems and to provide a working environment for testing optimization schemes. Its initial implementation has been for a simplified model of preliminary design of a high-speed civil transport. Upgrades being considered for the FIDO system include a more complete geometry description, required by high-fidelity aerodynamics and structures codes and based on a commercial Computer Aided Design (CAD) system. This report presents the philosophy behind some of the decisions that have shaped the FIDO system and gives a brief case study of the problems and successes encountered in integrating a CAD system into the FEDO framework.
Power optimization of wireless media systems with space-time block codes.
Yousefi'zadeh, Homayoun; Jafarkhani, Hamid; Moshfeghi, Mehran
2004-07-01
We present analytical and numerical solutions to the problem of power control in wireless media systems with multiple antennas. We formulate a set of optimization problems aimed at minimizing total power consumption of wireless media systems subject to a given level of QoS and an available bit rate. Our formulation takes into consideration the power consumption related to source coding, channel coding, and transmission of multiple-transmit antennas. In our study, we consider Gauss-Markov and video source models, Rayleigh fading channels along with the Bernoulli/Gilbert-Elliott loss models, and space-time block codes.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moss, Nicholas
The Kokkos Clang compiler is a version of the Clang C++ compiler that has been modified to perform targeted code generation for Kokkos constructs in the goal of generating highly optimized code and to provide semantic (domain) awareness throughout the compilation toolchain of these constructs such as parallel for and parallel reduce. This approach is taken to explore the possibilities of exposing the developer’s intentions to the underlying compiler infrastructure (e.g. optimization and analysis passes within the middle stages of the compiler) instead of relying solely on the restricted capabilities of C++ template metaprogramming. To date our current activities havemore » focused on correct GPU code generation and thus we have not yet focused on improving overall performance. The compiler is implemented by recognizing specific (syntactic) Kokkos constructs in order to bypass normal template expansion mechanisms and instead use the semantic knowledge of Kokkos to directly generate code in the compiler’s intermediate representation (IR); which is then translated into an NVIDIA-centric GPU program and supporting runtime calls. In addition, by capturing and maintaining the higher-level semantics of Kokkos directly within the lower levels of the compiler has the potential for significantly improving the ability of the compiler to communicate with the developer in the terms of their original programming model/semantics.« less
NASA Astrophysics Data System (ADS)
Roshanian, Jafar; Jodei, Jahangir; Mirshams, Mehran; Ebrahimi, Reza; Mirzaee, Masood
A new automated multi-level of fidelity Multi-Disciplinary Design Optimization (MDO) methodology has been developed at the MDO Laboratory of K.N. Toosi University of Technology. This paper explains a new design approach by formulation of developed disciplinary modules. A conceptual design for a small, solid-propellant launch vehicle was considered at two levels of fidelity structure. Low and medium level of fidelity disciplinary codes were developed and linked. Appropriate design and analysis codes were defined according to their effect on the conceptual design process. Simultaneous optimization of the launch vehicle was performed at the discipline level and system level. Propulsion, aerodynamics, structure and trajectory disciplinary codes were used. To reach the minimum launch weight, the Low LoF code first searches the whole design space to achieve the mission requirements. Then the medium LoF code receives the output of the low LoF and gives a value near the optimum launch weight with more details and higher fidelity.
On the optimality of a universal noiseless coder
NASA Technical Reports Server (NTRS)
Yeh, Pen-Shu; Rice, Robert F.; Miller, Warner H.
1993-01-01
Rice developed a universal noiseless coding structure that provides efficient performance over an extremely broad range of source entropy. This is accomplished by adaptively selecting the best of several easily implemented variable length coding algorithms. Variations of such noiseless coders have been used in many NASA applications. Custom VLSI coder and decoder modules capable of processing over 50 million samples per second have been fabricated and tested. In this study, the first of the code options used in this module development is shown to be equivalent to a class of Huffman code under the Humblet condition, for source symbol sets having a Laplacian distribution. Except for the default option, other options are shown to be equivalent to the Huffman codes of a modified Laplacian symbol set, at specified symbol entropy values. Simulation results are obtained on actual aerial imagery over a wide entropy range, and they confirm the optimality of the scheme. Comparison with other known techniques are performed on several widely used images and the results further validate the coder's optimality.
Optimization of Kink Stability in High-Beta Quasi-axisymmetric Stellarators
NASA Astrophysics Data System (ADS)
Fu, G. Y.; Ku, L.-P.; Manickam, J.; Cooper, W. A.
1998-11-01
A key issue for design of Quasi-axisymmetric stellarators( A. Reiman et al, this conference.) (QAS) is the stability of external kink modes driven by pressure-induced bootstrap current. In this work, the 3D MHD stability code TERPSICHORE(W.A. Cooper, Phys. Plasmas 3), 275(1996). is used to calculate the stability of low-n external kink modes in a high-beta QAS. The kink stability is optimized by adjusting plasma boundary shape (i.e., external coil configuration) as well as plasma pressure and current profiles. For this purpose, the TERPSICHORE code has been implemented successfully in an optimizer which maximizes kink stability as well as quasi-symmetry. A key factor for kink stability is rotational transform profile. It is found that the edge magnetic shear is strongly stabilizing. The amount of the shear needed for complete stabilization increases with edge transform. It is also found that the plasma boundary shape plays an important role in the kink stability besides transform profile. The physics mechanisms for the kink stability are being studied by examining the contributions of individual terms in δ W of the energy principle: the field line bending term, the current-driven term, the pressure-driven term, and the vacuum term. Detailed results will be reported.
Methods and codes for neutronic calculations of the MARIA research reactor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrzejewski, K.; Kulikowska, T.; Bretscher, M. M.
2002-02-18
The core of the MARIA high flux multipurpose research reactor is highly heterogeneous. It consists of beryllium blocks arranged in 6 x 8 matrix, tubular fuel assemblies, control rods and irradiation channels. The reflector is also heterogeneous and consists of graphite blocks clad with aluminum. Its structure is perturbed by the experimental beam tubes. This paper presents methods and codes used to calculate the MARIA reactor neutronics characteristics and experience gained thus far at IAE and ANL. At ANL the methods of MARIA calculations were developed in connection with the RERTR program. At IAE the package of programs was developedmore » to help its operator in optimization of fuel utilization.« less
Synergia: an accelerator modeling tool with 3-D space charge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amundson, James F.; Spentzouris, P.; /Fermilab
2004-07-01
High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
Optimum Design of High-Speed Prop-Rotors
NASA Technical Reports Server (NTRS)
Chattopadhyay, Aditi; McCarthy, Thomas Robert
1993-01-01
An integrated multidisciplinary optimization procedure is developed for application to rotary wing aircraft design. The necessary disciplines such as dynamics, aerodynamics, aeroelasticity, and structures are coupled within a closed-loop optimization process. The procedure developed is applied to address two different problems. The first problem considers the optimization of a helicopter rotor blade and the second problem addresses the optimum design of a high-speed tilting proprotor. In the helicopter blade problem, the objective is to reduce the critical vibratory shear forces and moments at the blade root, without degrading rotor aerodynamic performance and aeroelastic stability. In the case of the high-speed proprotor, the goal is to maximize the propulsive efficiency in high-speed cruise without deteriorating the aeroelastic stability in cruise and the aerodynamic performance in hover. The problems studied involve multiple design objectives; therefore, the optimization problems are formulated using multiobjective design procedures. A comprehensive helicopter analysis code is used for the rotary wing aerodynamic, dynamic and aeroelastic stability analyses and an algorithm developed specifically for these purposes is used for the structural analysis. A nonlinear programming technique coupled with an approximate analysis procedure is used to perform the optimization. The optimum blade designs obtained in each case are compared to corresponding reference designs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hellfeld, Daniel; Barton, Paul; Gunter, Donald
Gamma-ray imaging facilitates the efficient detection, characterization, and localization of compact radioactive sources in cluttered environments. Fieldable detector systems employing active planar coded apertures have demonstrated broad energy sensitivity via both coded aperture and Compton imaging modalities. But, planar configurations suffer from a limited field-of-view, especially in the coded aperture mode. In order to improve upon this limitation, we introduce a novel design by rearranging the detectors into an active coded spherical configuration, resulting in a 4pi isotropic field-of-view for both coded aperture and Compton imaging. This work focuses on the low- energy coded aperture modality and the optimization techniquesmore » used to determine the optimal number and configuration of 1 cm 3 CdZnTe coplanar grid detectors on a 14 cm diameter sphere with 192 available detector locations.« less
NASA Technical Reports Server (NTRS)
Lin, Shu; Fossorier, Marc
1998-01-01
For long linear block codes, maximum likelihood decoding based on full code trellises would be very hard to implement if not impossible. In this case, we may wish to trade error performance for the reduction in decoding complexity. Sub-optimum soft-decision decoding of a linear block code based on a low-weight sub-trellis can be devised to provide an effective trade-off between error performance and decoding complexity. This chapter presents such a suboptimal decoding algorithm for linear block codes. This decoding algorithm is iterative in nature and based on an optimality test. It has the following important features: (1) a simple method to generate a sequence of candidate code-words, one at a time, for test; (2) a sufficient condition for testing a candidate code-word for optimality; and (3) a low-weight sub-trellis search for finding the most likely (ML) code-word.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graf, Peter; Dykes, Katherine; Scott, George
The layout of turbines in a wind farm is already a challenging nonlinear, nonconvex, nonlinearly constrained continuous global optimization problem. Here we begin to address the next generation of wind farm optimization problems by adding the complexity that there is more than one turbine type to choose from. The optimization becomes a nonlinear constrained mixed integer problem, which is a very difficult class of problems to solve. Furthermore, this document briefly summarizes the algorithm and code we have developed, the code validation steps we have performed, and the initial results for multi-turbine type and placement optimization (TTP_OPT) we have run.
Manual of phosphoric acid fuel cell power plant optimization model and computer program
NASA Technical Reports Server (NTRS)
Lu, C. Y.; Alkasab, K. A.
1984-01-01
An optimized cost and performance model for a phosphoric acid fuel cell power plant system was derived and developed into a modular FORTRAN computer code. Cost, energy, mass, and electrochemical analyses were combined to develop a mathematical model for optimizing the steam to methane ratio in the reformer, hydrogen utilization in the PAFC plates per stack. The nonlinear programming code, COMPUTE, was used to solve this model, in which the method of mixed penalty function combined with Hooke and Jeeves pattern search was chosen to evaluate this specific optimization problem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McMillan, Kyle; Marleau, Peter; Brubaker, Erik
In coded aperture imaging, one of the most important factors determining the quality of reconstructed images is the choice of mask/aperture pattern. In many applications, uniformly redundant arrays (URAs) are widely accepted as the optimal mask pattern. Under ideal conditions, thin and highly opaque masks, URA patterns are mathematically constructed to provide artifact-free reconstruction however, the number of URAs for a chosen number of mask elements is limited and when highly penetrating particles such as fast neutrons and high-energy gamma-rays are being imaged, the optimum is seldom achieved. In this case more robust mask patterns that provide better reconstructed imagemore » quality may exist. Through the use of heuristic optimization methods and maximum likelihood expectation maximization (MLEM) image reconstruction, we show that for both point and extended neutron sources a random mask pattern can be optimized to provide better image quality than that of a URA.« less
Design optimization studies using COSMIC NASTRAN
NASA Technical Reports Server (NTRS)
Pitrof, Stephen M.; Bharatram, G.; Venkayya, Vipperla B.
1993-01-01
The purpose of this study is to create, test and document a procedure to integrate mathematical optimization algorithms with COSMIC NASTRAN. This procedure is very important to structural design engineers who wish to capitalize on optimization methods to ensure that their design is optimized for its intended application. The OPTNAST computer program was created to link NASTRAN and design optimization codes into one package. This implementation was tested using two truss structure models and optimizing their designs for minimum weight, subject to multiple loading conditions and displacement and stress constraints. However, the process is generalized so that an engineer could design other types of elements by adding to or modifying some parts of the code.
NASA Technical Reports Server (NTRS)
Alexandrov, N. M.; Nielsen, E. J.; Lewis, R. M.; Anderson, W. K.
2000-01-01
First-order approximation and model management is a methodology for a systematic use of variable-fidelity models or approximations in optimization. The intent of model management is to attain convergence to high-fidelity solutions with minimal expense in high-fidelity computations. The savings in terms of computationally intensive evaluations depends on the ability of the available lower-fidelity model or a suite of models to predict the improvement trends for the high-fidelity problem, Variable-fidelity models can be represented by data-fitting approximations, variable-resolution models. variable-convergence models. or variable physical fidelity models. The present work considers the use of variable-fidelity physics models. We demonstrate the performance of model management on an aerodynamic optimization of a multi-element airfoil designed to operate in the transonic regime. Reynolds-averaged Navier-Stokes equations represent the high-fidelity model, while the Euler equations represent the low-fidelity model. An unstructured mesh-based analysis code FUN2D evaluates functions and sensitivity derivatives for both models. Model management for the present demonstration problem yields fivefold savings in terms of high-fidelity evaluations compared to optimization done with high-fidelity computations alone.
Spherical hashing: binary code embedding with hyperspheres.
Heo, Jae-Pil; Lee, Youngwoon; He, Junfeng; Chang, Shih-Fu; Yoon, Sung-Eui
2015-11-01
Many binary code embedding schemes have been actively studied recently, since they can provide efficient similarity search, and compact data representations suitable for handling large scale image databases. Existing binary code embedding techniques encode high-dimensional data by using hyperplane-based hashing functions. In this paper we propose a novel hypersphere-based hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplane-based hashing functions. We also propose a new binary code distance function, spherical Hamming distance, tailored for our hypersphere-based binary coding scheme, and design an efficient iterative optimization process to achieve both balanced partitioning for each hash function and independence between hashing functions. Furthermore, we generalize spherical hashing to support various similarity measures defined by kernel functions. Our extensive experiments show that our spherical hashing technique significantly outperforms state-of-the-art techniques based on hyperplanes across various benchmarks with sizes ranging from one to 75 million of GIST, BoW and VLAD descriptors. The performance gains are consistent and large, up to 100 percent improvements over the second best method among tested methods. These results confirm the unique merits of using hyperspheres to encode proximity regions in high-dimensional spaces. Finally, our method is intuitive and easy to implement.
NASA Technical Reports Server (NTRS)
Meyn, Larry A.
2018-01-01
One of the goals of NASA's Revolutionary Vertical Lift Technology Project (RVLT) is to provide validated tools for multidisciplinary design, analysis and optimization (MDAO) of vertical lift vehicles. As part of this effort, the software package, RotorCraft Optimization Tools (RCOTOOLS), is being developed to facilitate incorporating key rotorcraft conceptual design codes into optimizations using the OpenMDAO multi-disciplinary optimization framework written in Python. RCOTOOLS, also written in Python, currently supports the incorporation of the NASA Design and Analysis of RotorCraft (NDARC) vehicle sizing tool and the Comprehensive Analytical Model of Rotorcraft Aerodynamics and Dynamics II (CAMRAD II) analysis tool into OpenMDAO-driven optimizations. Both of these tools use detailed, file-based inputs and outputs, so RCOTOOLS provides software wrappers to update input files with new design variable values, execute these codes and then extract specific response variable values from the file outputs. These wrappers are designed to be flexible and easy to use. RCOTOOLS also provides several utilities to aid in optimization model development, including Graphical User Interface (GUI) tools for browsing input and output files in order to identify text strings that are used to identify specific variables as optimization input and response variables. This paper provides an overview of RCOTOOLS and its use
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors
NASA Astrophysics Data System (ADS)
Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.
2016-05-01
This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.
1982-04-01
This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
PERI - Auto-tuning Memory Intensive Kernels for Multicore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H; Williams, Samuel; Datta, Kaushik
2008-06-24
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to Sparse Matrix Vector Multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we developmore » a code generator for each kernel that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4X improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.« less
PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kylasa, S.B., E-mail: skylasa@purdue.edu; Aktulga, H.M., E-mail: hmaktulga@lbl.gov; Grama, A.Y., E-mail: ayg@cs.purdue.edu
2014-09-01
We present an efficient and highly accurate GP-GPU implementation of our community code, PuReMD, for reactive molecular dynamics simulations using the ReaxFF force field. PuReMD and its incorporation into LAMMPS (Reax/C) is used by a large number of research groups worldwide for simulating diverse systems ranging from biomembranes to explosives (RDX) at atomistic level of detail. The sub-femtosecond time-steps associated with ReaxFF strongly motivate significant improvements to per-timestep simulation time through effective use of GPUs. This paper presents, in detail, the design and implementation of PuReMD-GPU, which enables ReaxFF simulations on GPUs, as well as various performance optimization techniques wemore » developed to obtain high performance on state-of-the-art hardware. Comprehensive experiments on model systems (bulk water and amorphous silica) are presented to quantify the performance improvements achieved by PuReMD-GPU and to verify its accuracy. In particular, our experiments show up to 16× improvement in runtime compared to our highly optimized CPU-only single-core ReaxFF implementation. PuReMD-GPU is a unique production code, and is currently available on request from the authors.« less
Coupled Aerodynamic and Structural Sensitivity Analysis of a High-Speed Civil Transport
NASA Technical Reports Server (NTRS)
Mason, B. H.; Walsh, J. L.
2001-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite-element structural analysis and computational fluid dynamics aerodynamic analysis. In a previous study, a multi-disciplinary analysis system for a high-speed civil transport was formulated to integrate a set of existing discipline analysis codes, some of them computationally intensive, This paper is an extension of the previous study, in which the sensitivity analysis for the coupled aerodynamic and structural analysis problem is formulated and implemented. Uncoupled stress sensitivities computed with a constant load vector in a commercial finite element analysis code are compared to coupled aeroelastic sensitivities computed by finite differences. The computational expense of these sensitivity calculation methods is discussed.
Optimal Codes for the Burst Erasure Channel
NASA Technical Reports Server (NTRS)
Hamkins, Jon
2010-01-01
Deep space communications over noisy channels lead to certain packets that are not decodable. These packets leave gaps, or bursts of erasures, in the data stream. Burst erasure correcting codes overcome this problem. These are forward erasure correcting codes that allow one to recover the missing gaps of data. Much of the recent work on this topic concentrated on Low-Density Parity-Check (LDPC) codes. These are more complicated to encode and decode than Single Parity Check (SPC) codes or Reed-Solomon (RS) codes, and so far have not been able to achieve the theoretical limit for burst erasure protection. A block interleaved maximum distance separable (MDS) code (e.g., an SPC or RS code) offers near-optimal burst erasure protection, in the sense that no other scheme of equal total transmission length and code rate could improve the guaranteed correctible burst erasure length by more than one symbol. The optimality does not depend on the length of the code, i.e., a short MDS code block interleaved to a given length would perform as well as a longer MDS code interleaved to the same overall length. As a result, this approach offers lower decoding complexity with better burst erasure protection compared to other recent designs for the burst erasure channel (e.g., LDPC codes). A limitation of the design is its lack of robustness to channels that have impairments other than burst erasures (e.g., additive white Gaussian noise), making its application best suited for correcting data erasures in layers above the physical layer. The efficiency of a burst erasure code is the length of its burst erasure correction capability divided by the theoretical upper limit on this length. The inefficiency is one minus the efficiency. The illustration compares the inefficiency of interleaved RS codes to Quasi-Cyclic (QC) LDPC codes, Euclidean Geometry (EG) LDPC codes, extended Irregular Repeat Accumulate (eIRA) codes, array codes, and random LDPC codes previously proposed for burst erasure protection. As can be seen, the simple interleaved RS codes have substantially lower inefficiency over a wide range of transmission lengths.
Force field development with GOMC, a fast new Monte Carlo molecular simulation code
NASA Astrophysics Data System (ADS)
Mick, Jason Richard
In this work GOMC (GPU Optimized Monte Carlo) a new fast, flexible, and free molecular Monte Carlo code for the simulation atomistic chemical systems is presented. The results of a large Lennard-Jonesium simulation in the Gibbs ensemble is presented. Force fields developed using the code are also presented. To fit the models a quantitative fitting process is outlined using a scoring function and heat maps. The presented n-6 force fields include force fields for noble gases and branched alkanes. These force fields are shown to be the most accurate LJ or n-6 force fields to date for these compounds, capable of reproducing pure fluid behavior and binary mixture behavior to a high degree of accuracy.
A finite element code for electric motor design
NASA Technical Reports Server (NTRS)
Campbell, C. Warren
1994-01-01
FEMOT is a finite element program for solving the nonlinear magnetostatic problem. This version uses nonlinear, Newton first order elements. The code can be used for electric motor design and analysis. FEMOT can be embedded within an optimization code that will vary nodal coordinates to optimize the motor design. The output from FEMOT can be used to determine motor back EMF, torque, cogging, and magnet saturation. It will run on a PC and will be available to anyone who wants to use it.
Performance and Feasibility Analysis of a Wind Turbine Power System for Use on Mars
NASA Technical Reports Server (NTRS)
Lichter, Matthew D.; Viterna, Larry
1999-01-01
A wind turbine power system for future missions to the Martian surface was studied for performance and feasibility. A C++ program was developed from existing FORTRAN code to analyze the power capabilities of wind turbines under different environments and design philosophies. Power output, efficiency, torque, thrust, and other performance criteria could be computed given design geometries, atmospheric conditions, and airfoil behavior. After reviewing performance of such a wind turbine, a conceptual system design was modeled to evaluate feasibility. More analysis code was developed to study and optimize the overall structural design. Findings of this preliminary study show that turbine power output on Mars could be as high as several hundred kilowatts. The optimized conceptual design examined here would have a power output of 104 kW, total mass of 1910 kg, and specific power of 54.6 W/kg.
NASA Astrophysics Data System (ADS)
Chang, Ching-Chun; Liu, Yanjun; Nguyen, Son T.
2015-03-01
Data hiding is a technique that embeds information into digital cover data. This technique has been concentrated on the spatial uncompressed domain, and it is considered more challenging to perform in the compressed domain, i.e., vector quantization, JPEG, and block truncation coding (BTC). In this paper, we propose a new data hiding scheme for BTC-compressed images. In the proposed scheme, a dynamic programming strategy was used to search for the optimal solution of the bijective mapping function for LSB substitution. Then, according to the optimal solution, each mean value embeds three secret bits to obtain high hiding capacity with low distortion. The experimental results indicated that the proposed scheme obtained both higher hiding capacity and hiding efficiency than the other four existing schemes, while ensuring good visual quality of the stego-image. In addition, the proposed scheme achieved a low bit rate as original BTC algorithm.
Noussa-Yao, Joseph; Heudes, Didier; Escudie, Jean-Baptiste; Degoulet, Patrice
2016-01-01
Short-stay MSO (Medicine, Surgery, Obstetrics) hospitalization activities in public and private hospitals providing public services are funded through charges for the services provided (T2A in French). Coding must be well matched to the severity of the patient's condition, to ensure that appropriate funding is provided to the hospital. We propose the use of an autocompletion process and multidimensional matrix, to help physicians to improve the expression of information and to optimize clinical coding. With this approach, physicians without knowledge of the encoding rules begin from a rough concept, which is gradually refined through semantic proximity and uses information on the associated codes stemming of optimized knowledge bases of diagnosis code.
ERIC Educational Resources Information Center
Lee, Seong-Soo
1982-01-01
Tenth-grade students (n=144) received training on one of three processing methods: coding-mapping (simultaneous), coding only, or decision tree (sequential). The induced simultaneous processing strategy worked optimally under rule learning, while the sequential strategy was difficult to induce and/or not optimal for rule-learning operations.…
NASA Astrophysics Data System (ADS)
Kurosu, Keita; Das, Indra J.; Moskvin, Vadim P.
2016-01-01
Spot scanning, owing to its superior dose-shaping capability, provides unsurpassed dose conformity, in particular for complex targets. However, the robustness of the delivered dose distribution and prescription has to be verified. Monte Carlo (MC) simulation has the potential to generate significant advantages for high-precise particle therapy, especially for medium containing inhomogeneities. However, the inherent choice of computational parameters in MC simulation codes of GATE, PHITS and FLUKA that is observed for uniform scanning proton beam needs to be evaluated. This means that the relationship between the effect of input parameters and the calculation results should be carefully scrutinized. The objective of this study was, therefore, to determine the optimal parameters for the spot scanning proton beam for both GATE and PHITS codes by using data from FLUKA simulation as a reference. The proton beam scanning system of the Indiana University Health Proton Therapy Center was modeled in FLUKA, and the geometry was subsequently and identically transferred to GATE and PHITS. Although the beam transport is managed by spot scanning system, the spot location is always set at the center of a water phantom of 600 × 600 × 300 mm3, which is placed after the treatment nozzle. The percentage depth dose (PDD) is computed along the central axis using 0.5 × 0.5 × 0.5 mm3 voxels in the water phantom. The PDDs and the proton ranges obtained with several computational parameters are then compared to those of FLUKA, and optimal parameters are determined from the accuracy of the proton range, suppressed dose deviation, and computational time minimization. Our results indicate that the optimized parameters are different from those for uniform scanning, suggesting that the gold standard for setting computational parameters for any proton therapy application cannot be determined consistently since the impact of setting parameters depends on the proton irradiation technique. We therefore conclude that customization parameters must be set with reference to the optimized parameters of the corresponding irradiation technique in order to render them useful for achieving artifact-free MC simulation for use in computational experiments and clinical treatments.
NASA Astrophysics Data System (ADS)
De Geyter, G.; Baes, M.; Fritz, J.; Camps, P.
2013-02-01
We present FitSKIRT, a method to efficiently fit radiative transfer models to UV/optical images of dusty galaxies. These images have the advantage that they have better spatial resolution compared to FIR/submm data. FitSKIRT uses the GAlib genetic algorithm library to optimize the output of the SKIRT Monte Carlo radiative transfer code. Genetic algorithms prove to be a valuable tool in handling the multi- dimensional search space as well as the noise induced by the random nature of the Monte Carlo radiative transfer code. FitSKIRT is tested on artificial images of a simulated edge-on spiral galaxy, where we gradually increase the number of fitted parameters. We find that we can recover all model parameters, even if all 11 model parameters are left unconstrained. Finally, we apply the FitSKIRT code to a V-band image of the edge-on spiral galaxy NGC 4013. This galaxy has been modeled previously by other authors using different combinations of radiative transfer codes and optimization methods. Given the different models and techniques and the complexity and degeneracies in the parameter space, we find reasonable agreement between the different models. We conclude that the FitSKIRT method allows comparison between different models and geometries in a quantitative manner and minimizes the need of human intervention and biasing. The high level of automation makes it an ideal tool to use on larger sets of observed data.
Correlated variability modifies working memory fidelity in primate prefrontal neuronal ensembles
Leavitt, Matthew L.; Pieper, Florian; Sachs, Adam J.; Martinez-Trujillo, Julio C.
2017-01-01
Neurons in the primate lateral prefrontal cortex (LPFC) encode working memory (WM) representations via sustained firing, a phenomenon hypothesized to arise from recurrent dynamics within ensembles of interconnected neurons. Here, we tested this hypothesis by using microelectrode arrays to examine spike count correlations (rsc) in LPFC neuronal ensembles during a spatial WM task. We found a pattern of pairwise rsc during WM maintenance indicative of stronger coupling between similarly tuned neurons and increased inhibition between dissimilarly tuned neurons. We then used a linear decoder to quantify the effects of the high-dimensional rsc structure on information coding in the neuronal ensembles. We found that the rsc structure could facilitate or impair coding, depending on the size of the ensemble and tuning properties of its constituent neurons. A simple optimization procedure demonstrated that near-maximum decoding performance could be achieved using a relatively small number of neurons. These WM-optimized subensembles were more signal correlation (rsignal)-diverse and anatomically dispersed than predicted by the statistics of the full recorded population of neurons, and they often contained neurons that were poorly WM-selective, yet enhanced coding fidelity by shaping the ensemble’s rsc structure. We observed a pattern of rsc between LPFC neurons indicative of recurrent dynamics as a mechanism for WM-related activity and that the rsc structure can increase the fidelity of WM representations. Thus, WM coding in LPFC neuronal ensembles arises from a complex synergy between single neuron coding properties and multidimensional, ensemble-level phenomena. PMID:28275096
Memory-efficient decoding of LDPC codes
NASA Technical Reports Server (NTRS)
Kwok-San Lee, Jason; Thorpe, Jeremy; Hawkins, Jon
2005-01-01
We present a low-complexity quantization scheme for the implementation of regular (3,6) LDPC codes. The quantization parameters are optimized to maximize the mutual information between the source and the quantized messages. Using this non-uniform quantized belief propagation algorithm, we have simulated that an optimized 3-bit quantizer operates with 0.2dB implementation loss relative to a floating point decoder, and an optimized 4-bit quantizer operates less than 0.1dB quantization loss.
Optimal UAV Path Planning for Tracking a Moving Ground Vehicle with a Gimbaled Camera
2014-03-27
micro SD card slot to record all video taken at 1080P resolution. This feature allows the team to record the high definition video taken by the...Inequality constraints 64 h=[]; %Equality constraints 104 Bibliography 1. “ DIY Drones: Official ArduPlane Repository”, 2013. URL https://code
Design of 9.271-pressure-ratio 5-stage core compressor and overall performance for first 3 stages
NASA Technical Reports Server (NTRS)
Steinke, Ronald J.
1986-01-01
Overall aerodynamic design information is given for all five stages of an axial flow core compressor (74A) having a 9.271 pressure ratio and 29.710 kg/sec flow. For the inlet stage group (first three stages), detailed blade element design information and experimental overall performance are given. At rotor 1 inlet tip speed was 430.291 m/sec, and hub to tip radius ratio was 0.488. A low number of blades per row was achieved by the use of low-aspect-ratio blading of moderate solidity. The high reaction stages have about equal energy addition. Radial energy varied to give constant total pressure at the rotor exit. The blade element profile and shock losses and the incidence and deviation angles were based on relevant experimental data. Blade shapes are mostly double circular arc. Analysis by a three-dimensional Euler code verified the experimentally measured high flow at design speed and IGV-stator setting angles. An optimization code gave an optimal IGV-stator reset schedule for higher measured efficiency at all speeds.
Shaping electromagnetic waves using software-automatically-designed metasurfaces.
Zhang, Qian; Wan, Xiang; Liu, Shuo; Yuan Yin, Jia; Zhang, Lei; Jun Cui, Tie
2017-06-15
We present a fully digital procedure of designing reflective coding metasurfaces to shape reflected electromagnetic waves. The design procedure is completely automatic, controlled by a personal computer. In details, the macro coding units of metasurface are automatically divided into several types (e.g. two types for 1-bit coding, four types for 2-bit coding, etc.), and each type of the macro coding units is formed by discretely random arrangement of micro coding units. By combining an optimization algorithm and commercial electromagnetic software, the digital patterns of the macro coding units are optimized to possess constant phase difference for the reflected waves. The apertures of the designed reflective metasurfaces are formed by arranging the macro coding units with certain coding sequence. To experimentally verify the performance, a coding metasurface is fabricated by automatically designing two digital 1-bit unit cells, which are arranged in array to constitute a periodic coding metasurface to generate the required four-beam radiations with specific directions. Two complicated functional metasurfaces with circularly- and elliptically-shaped radiation beams are realized by automatically designing 4-bit macro coding units, showing excellent performance of the automatic designs by software. The proposed method provides a smart tool to realize various functional devices and systems automatically.
On the optimality of code options for a universal noiseless coder
NASA Technical Reports Server (NTRS)
Yeh, Pen-Shu; Rice, Robert F.; Miller, Warner
1991-01-01
A universal noiseless coding structure was developed that provides efficient performance over an extremely broad range of source entropy. This is accomplished by adaptively selecting the best of several easily implemented variable length coding algorithms. Custom VLSI coder and decoder modules capable of processing over 20 million samples per second are currently under development. The first of the code options used in this module development is shown to be equivalent to a class of Huffman code under the Humblet condition, other options are shown to be equivalent to the Huffman codes of a modified Laplacian symbol set, at specified symbol entropy values. Simulation results are obtained on actual aerial imagery, and they confirm the optimality of the scheme. On sources having Gaussian or Poisson distributions, coder performance is also projected through analysis and simulation.
NASA Technical Reports Server (NTRS)
Watson, Andrew B.
1990-01-01
All vision systems, both human and machine, transform the spatial image into a coded representation. Particular codes may be optimized for efficiency or to extract useful image features. Researchers explored image codes based on primary visual cortex in man and other primates. Understanding these codes will advance the art in image coding, autonomous vision, and computational human factors. In cortex, imagery is coded by features that vary in size, orientation, and position. Researchers have devised a mathematical model of this transformation, called the Hexagonal oriented Orthogonal quadrature Pyramid (HOP). In a pyramid code, features are segregated by size into layers, with fewer features in the layers devoted to large features. Pyramid schemes provide scale invariance, and are useful for coarse-to-fine searching and for progressive transmission of images. The HOP Pyramid is novel in three respects: (1) it uses a hexagonal pixel lattice, (2) it uses oriented features, and (3) it accurately models most of the prominent aspects of primary visual cortex. The transform uses seven basic features (kernels), which may be regarded as three oriented edges, three oriented bars, and one non-oriented blob. Application of these kernels to non-overlapping seven-pixel neighborhoods yields six oriented, high-pass pyramid layers, and one low-pass (blob) layer.
Optimal lightpath placement on a metropolitan-area network linked with optical CDMA local nets
NASA Astrophysics Data System (ADS)
Wang, Yih-Fuh; Huang, Jen-Fa
2008-01-01
A flexible optical metropolitan-area network (OMAN) [J.F. Huang, Y.F. Wang, C.Y. Yeh, Optimal configuration of OCDMA-based MAN with multimedia services, in: 23rd Biennial Symposium on Communications, Queen's University, Kingston, Canada, May 29-June 2, 2006, pp. 144-148] structured with OCDMA linkage is proposed to support multimedia services with multi-rate or various qualities of service. To prioritize transmissions in OCDMA, the orthogonal variable spreading factor (OVSF) codes widely used in wireless CDMA are adopted. In addition, for feasible multiplexing, unipolar OCDMA modulation [L. Nguyen, B. Aazhang, J.F. Young, All-optical CDMA with bipolar codes, IEEE Electron. Lett. 31 (6) (1995) 469-470] is used to generate the code selector of multi-rate OMAN, and a flexible fiber-grating-based system is used for the equipment on OCDMA-OVSF code. These enable an OMAN to assign suitable OVSF codes when creating different-rate lightpaths. How to optimally configure a multi-rate OMAN is a challenge because of displaced lightpaths. In this paper, a genetically modified genetic algorithm (GMGA) [L.R. Chen, Flexible fiber Bragg grating encoder/decoder for hybrid wavelength-time optical CDMA, IEEE Photon. Technol. Lett. 13 (11) (2001) 1233-1235] is used to preplan lightpaths in order to optimally configure an OMAN. To evaluate the performance of the GMGA, we compared it with different preplanning optimization algorithms. Simulation results revealed that the GMGA very efficiently solved the problem.
NASA Astrophysics Data System (ADS)
Hsueh, Yu-Li; Rogge, Matthew S.; Shaw, Wei-Tao; Kim, Jaedon; Yamamoto, Shu; Kazovsky, Leonid G.
2005-09-01
A simple and cost-effective upgrade of existing passive optical networks (PONs) is proposed, which realizes service overlay by novel spectral-shaping line codes. A hierarchical coding procedure allows processing simplicity and achieves desired long-term spectral properties. Different code rates are supported, and the spectral shape can be properly tailored to adapt to different systems. The computation can be simplified by quantization of trigonometric functions. DC balance is achieved by passing the dc residual between processing windows. The proposed line codes tend to introduce bit transitions to avoid long consecutive identical bits and facilitate receiver clock recovery. Experiments demonstrate and compare several different optimized line codes. For a specific tolerable interference level, the optimal line code can easily be determined, which maximizes the data throughput. The service overlay using the line-coding technique leaves existing services and field-deployed fibers untouched but fully functional, providing a very flexible and economic way to upgrade existing PONs.
Comparisons between MCNP, EGS4 and experiment for clinical electron beams.
Jeraj, R; Keall, P J; Ostwald, P M
1999-03-01
Understanding the limitations of Monte Carlo codes is essential in order to avoid systematic errors in simulations, and to suggest further improvement of the codes. MCNP and EGS4, Monte Carlo codes commonly used in medical physics, were compared and evaluated against electron depth dose data and experimental backscatter results obtained using clinical radiotherapy beams. Different physical models and algorithms used in the codes give significantly different depth dose curves and electron backscattering factors. The default version of MCNP calculates electron depth dose curves which are too penetrating. The MCNP results agree better with experiment if the ITS-style energy-indexing algorithm is used. EGS4 underpredicts electron backscattering for high-Z materials. The results slightly improve if optimal PRESTA-I parameters are used. MCNP simulates backscattering well even for high-Z materials. To conclude the comparison, a timing study was performed. EGS4 is generally faster than MCNP and use of a large number of scoring voxels dramatically slows down the MCNP calculation. However, use of a large number of geometry voxels in MCNP only slightly affects the speed of the calculation.
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Aeroelastic Tailoring Study of N+2 Low Boom Supersonic Commerical Transport Aircraft
NASA Technical Reports Server (NTRS)
Pak, Chan-Gi
2015-01-01
The Lockheed Martin N+2 Low - boom Supersonic Commercial Transport (LSCT) aircraft was optimized in this study through the use of a multidisciplinary design optimization tool developed at the National Aeronautics and S pace Administration Armstrong Flight Research Center. A total of 111 design variables we re used in the first optimization run. Total structural weight was the objective function in this optimization run. Design requirements for strength, buckling, and flutter we re selected as constraint functions during the first optimization run. The MSC Nastran code was used to obtain the modal, strength, and buckling characteristics. Flutter and trim analyses we re based on ZAERO code, and landing and ground control loads were computed using an in - house code. The w eight penalty to satisfy all the design requirement s during the first optimization run was 31,367 lb, a 9.4% increase from the baseline configuration. The second optimization run was prepared and based on the big-bang big-crunch algorithm. Six composite ply angles for the second and fourth composite layers were selected as discrete design variables for the second optimization run. Composite ply angle changes can't improve the weight configuration of the N+2 LSCT aircraft. However, this second optimization run can create more tolerance for the active and near active strength constraint values for future weight optimization runs.
NASA Astrophysics Data System (ADS)
Takemiya, Tetsushi
In modern aerospace engineering, the physics-based computational design method is becoming more important, as it is more efficient than experiments and because it is more suitable in designing new types of aircraft (e.g., unmanned aerial vehicles or supersonic business jets) than the conventional design method, which heavily relies on historical data. To enhance the reliability of the physics-based computational design method, researchers have made tremendous efforts to improve the fidelity of models. However, high-fidelity models require longer computational time, so the advantage of efficiency is partially lost. This problem has been overcome with the development of variable fidelity optimization (VFO). In VFO, different fidelity models are simultaneously employed in order to improve the speed and the accuracy of convergence in an optimization process. Among the various types of VFO methods, one of the most promising methods is the approximation management framework (AMF). In the AMF, objective and constraint functions of a low-fidelity model are scaled at a design point so that the scaled functions, which are referred to as "surrogate functions," match those of a high-fidelity model. Since scaling functions and the low-fidelity model constitutes surrogate functions, evaluating the surrogate functions is faster than evaluating the high-fidelity model. Therefore, in the optimization process, in which gradient-based optimization is implemented and thus many function calls are required, the surrogate functions are used instead of the high-fidelity model to obtain a new design point. The best feature of the AMF is that it may converge to a local optimum of the high-fidelity model in much less computational time than the high-fidelity model. However, through literature surveys and implementations of the AMF, the author xx found that (1) the AMF is very vulnerable when the computational analysis models have numerical noise, which is very common in high-fidelity models, and that (2) the AMF terminates optimization erroneously when the optimization problems have constraints. The first problem is due to inaccuracy in computing derivatives in the AMF, and the second problem is due to erroneous treatment of the trust region ratio, which sets the size of the domain for an optimization in the AMF. In order to solve the first problem of the AMF, automatic differentiation (AD) technique, which reads the codes of analysis models and automatically generates new derivative codes based on some mathematical rules, is applied. If derivatives are computed with the generated derivative code, they are analytical, and the required computational time is independent of the number of design variables, which is very advantageous for realistic aerospace engineering problems. However, if analysis models implement iterative computations such as computational fluid dynamics (CFD), which solves system partial differential equations iteratively, computing derivatives through the AD requires a massive memory size. The author solved this deficiency by modifying the AD approach and developing a more efficient implementation with CFD, and successfully applied the AD to general CFD software. In order to solve the second problem of the AMF, the governing equation of the trust region ratio, which is very strict against the violation of constraints, is modified so that it can accept the violation of constraints within some tolerance. By accepting violations of constraints during the optimization process, the AMF can continue optimization without terminating immaturely and eventually find the true optimum design point. With these modifications, the AMF is referred to as "Robust AMF," and it is applied to airfoil and wing aerodynamic design problems using Euler CFD software. The former problem has 21 design variables, and the latter 64. In both problems, derivatives computed with the proposed AD method are first compared with those computed with the finite differentiation (FD) method, and then, the Robust AMF is implemented along with the sequential quadratic programming (SQP) optimization method with only high-fidelity models. The proposed AD method computes derivatives more accurately and faster than the FD method, and the Robust AMF successfully optimizes shapes of the airfoil and the wing in a much shorter time than SQP with only high-fidelity models. These results clearly show the effectiveness of the Robust AMF. Finally, the feasibility of reducing computational time for calculating derivatives and the necessity of AMF with an optimum design point always in the feasible region are discussed as future work.
A Hybrid Optimization Framework with POD-based Order Reduction and Design-Space Evolution Scheme
NASA Astrophysics Data System (ADS)
Ghoman, Satyajit S.
The main objective of this research is to develop an innovative multi-fidelity multi-disciplinary design, analysis and optimization suite that integrates certain solution generation codes and newly developed innovative tools to improve the overall optimization process. The research performed herein is divided into two parts: (1) the development of an MDAO framework by integration of variable fidelity physics-based computational codes, and (2) enhancements to such a framework by incorporating innovative features extending its robustness. The first part of this dissertation describes the development of a conceptual Multi-Fidelity Multi-Strategy and Multi-Disciplinary Design Optimization Environment (M3 DOE), in context of aircraft wing optimization. M 3 DOE provides the user a capability to optimize configurations with a choice of (i) the level of fidelity desired, (ii) the use of a single-step or multi-step optimization strategy, and (iii) combination of a series of structural and aerodynamic analyses. The modularity of M3 DOE allows it to be a part of other inclusive optimization frameworks. The M 3 DOE is demonstrated within the context of shape and sizing optimization of the wing of a Generic Business Jet aircraft. Two different optimization objectives, viz. dry weight minimization, and cruise range maximization are studied by conducting one low-fidelity and two high-fidelity optimization runs to demonstrate the application scope of M3 DOE. The second part of this dissertation describes the development of an innovative hybrid optimization framework that extends the robustness of M 3 DOE by employing a proper orthogonal decomposition-based design-space order reduction scheme combined with the evolutionary algorithm technique. The POD method of extracting dominant modes from an ensemble of candidate configurations is used for the design-space order reduction. The snapshot of candidate population is updated iteratively using evolutionary algorithm technique of fitness-driven retention. This strategy capitalizes on the advantages of evolutionary algorithm as well as POD-based reduced order modeling, while overcoming the shortcomings inherent with these techniques. When linked with M3 DOE, this strategy offers a computationally efficient methodology for problems with high level of complexity and a challenging design-space. This newly developed framework is demonstrated for its robustness on a nonconventional supersonic tailless air vehicle wing shape optimization problem.
2014-09-01
High Fructose Corn Syrup Diluted 1 to 10 percent by weight 50 to 500 mg/l Slow Release Whey (fresh/powered) Dissolved (powdered form) or injected...the assessment of remedial progress and functioning. This project also addressed several high priority needs from the Navy Environmental Quality...memory high -performance computing systems. For instance, as of March 2012 the code has been successfully executed on 2 cpu’s for an inversion problem
Zhang, Chenchu; Hu, Yanlei; Du, Wenqiang; Wu, Peichao; Rao, Shenglong; Cai, Ze; Lao, Zhaoxin; Xu, Bing; Ni, Jincheng; Li, Jiawen; Zhao, Gang; Wu, Dong; Chu, Jiaru; Sugioka, Koji
2016-09-13
Rapid integration of high-quality functional devices in microchannels is in highly demand for miniature lab-on-a-chip applications. This paper demonstrates the embellishment of existing microfluidic devices with integrated micropatterns via femtosecond laser MRAF-based holographic patterning (MHP) microfabrication, which proves two-photon polymerization (TPP) based on spatial light modulator (SLM) to be a rapid and powerful technology for chip functionalization. Optimized mixed region amplitude freedom (MRAF) algorithm has been used to generate high-quality shaped focus field. Base on the optimized parameters, a single-exposure approach is developed to fabricate 200 × 200 μm microstructure arrays in less than 240 ms. Moreover, microtraps, QR code and letters are integrated into a microdevice by the advanced method for particles capture and device identification. These results indicate that such a holographic laser embellishment of microfluidic devices is simple, flexible and easy to access, which has great potential in lab-on-a-chip applications of biological culture, chemical analyses and optofluidic devices.
NASA Astrophysics Data System (ADS)
Zhang, Chenchu; Hu, Yanlei; Du, Wenqiang; Wu, Peichao; Rao, Shenglong; Cai, Ze; Lao, Zhaoxin; Xu, Bing; Ni, Jincheng; Li, Jiawen; Zhao, Gang; Wu, Dong; Chu, Jiaru; Sugioka, Koji
2016-09-01
Rapid integration of high-quality functional devices in microchannels is in highly demand for miniature lab-on-a-chip applications. This paper demonstrates the embellishment of existing microfluidic devices with integrated micropatterns via femtosecond laser MRAF-based holographic patterning (MHP) microfabrication, which proves two-photon polymerization (TPP) based on spatial light modulator (SLM) to be a rapid and powerful technology for chip functionalization. Optimized mixed region amplitude freedom (MRAF) algorithm has been used to generate high-quality shaped focus field. Base on the optimized parameters, a single-exposure approach is developed to fabricate 200 × 200 μm microstructure arrays in less than 240 ms. Moreover, microtraps, QR code and letters are integrated into a microdevice by the advanced method for particles capture and device identification. These results indicate that such a holographic laser embellishment of microfluidic devices is simple, flexible and easy to access, which has great potential in lab-on-a-chip applications of biological culture, chemical analyses and optofluidic devices.
Session on High Speed Civil Transport Design Capability Using MDO and High Performance Computing
NASA Technical Reports Server (NTRS)
Rehder, Joe
2000-01-01
Since the inception of CAS in 1992, NASA Langley has been conducting research into applying multidisciplinary optimization (MDO) and high performance computing toward reducing aircraft design cycle time. The focus of this research has been the development of a series of computational frameworks and associated applications that increased in capability, complexity, and performance over time. The culmination of this effort is an automated high-fidelity analysis capability for a high speed civil transport (HSCT) vehicle installed on a network of heterogeneous computers with a computational framework built using Common Object Request Broker Architecture (CORBA) and Java. The main focus of the research in the early years was the development of the Framework for Interdisciplinary Design Optimization (FIDO) and associated HSCT applications. While the FIDO effort was eventually halted, work continued on HSCT applications of ever increasing complexity. The current application, HSCT4.0, employs high fidelity CFD and FEM analysis codes. For each analysis cycle, the vehicle geometry and computational grids are updated using new values for design variables. Processes for aeroelastic trim, loads convergence, displacement transfer, stress and buckling, and performance have been developed. In all, a total of 70 processes are integrated in the analysis framework. Many of the key processes include automatic differentiation capabilities to provide sensitivity information that can be used in optimization. A software engineering process was developed to manage this large project. Defining the interactions among 70 processes turned out to be an enormous, but essential, task. A formal requirements document was prepared that defined data flow among processes and subprocesses. A design document was then developed that translated the requirements into actual software design. A validation program was defined and implemented to ensure that codes integrated into the framework produced the same results as their standalone counterparts. Finally, a Commercial Off the Shelf (COTS) configuration management system was used to organize the software development. A computational environment, CJOPT, based on the Common Object Request Broker Architecture, CORBA, and the Java programming language has been developed as a framework for multidisciplinary analysis and Optimization. The environment exploits the parallelisms inherent in the application and distributes the constituent disciplines on machines best suited to their needs. In CJOpt, a discipline code is "wrapped" as an object. An interface to the object identifies the functionality (services) provided by the discipline, defined in Interface Definition Language (IDL) and implemented using Java. The results of using the HSCT4.0 capability are described. A summary of lessons learned is also presented. The use of some of the processes, codes, and techniques by industry are highlighted. The application of the methodology developed in this research to other aircraft are described. Finally, we show how the experience gained is being applied to entirely new vehicles, such as the Reusable Space Transportation System. Additional information is contained in the original.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Magnetic resonance image compression using scalar-vector quantization
NASA Astrophysics Data System (ADS)
Mohsenian, Nader; Shahri, Homayoun
1995-12-01
A new coding scheme based on the scalar-vector quantizer (SVQ) is developed for compression of medical images. SVQ is a fixed-rate encoder and its rate-distortion performance is close to that of optimal entropy-constrained scalar quantizers (ECSQs) for memoryless sources. The use of a fixed-rate quantizer is expected to eliminate some of the complexity issues of using variable-length scalar quantizers. When transmission of images over noisy channels is considered, our coding scheme does not suffer from error propagation which is typical of coding schemes which use variable-length codes. For a set of magnetic resonance (MR) images, coding results obtained from SVQ and ECSQ at low bit-rates are indistinguishable. Furthermore, our encoded images are perceptually indistinguishable from the original, when displayed on a monitor. This makes our SVQ based coder an attractive compression scheme for picture archiving and communication systems (PACS), currently under consideration for an all digital radiology environment in hospitals, where reliable transmission, storage, and high fidelity reconstruction of images are desired.
Optimizing the use of a sensor resource for opponent polarization coding
Heras, Francisco J.H.
2017-01-01
Flies use specialized photoreceptors R7 and R8 in the dorsal rim area (DRA) to detect skylight polarization. R7 and R8 form a tiered waveguide (central rhabdomere pair, CRP) with R7 on top, filtering light delivered to R8. We examine how the division of a given resource, CRP length, between R7 and R8 affects their ability to code polarization angle. We model optical absorption to show how the length fractions allotted to R7 and R8 determine the rates at which they transduce photons, and correct these rates for transduction unit saturation. The rates give polarization signal and photon noise in R7, and in R8. Their signals are combined in an opponent unit, intrinsic noise added, and the unit’s output analysed to extract two measures of coding ability, number of discriminable polarization angles and mutual information. A very long R7 maximizes opponent signal amplitude, but codes inefficiently due to photon noise in the very short R8. Discriminability and mutual information are optimized by maximizing signal to noise ratio, SNR. At lower light levels approximately equal lengths of R7 and R8 are optimal because photon noise dominates. At higher light levels intrinsic noise comes to dominate and a shorter R8 is optimum. The optimum R8 length fractions falls to one third. This intensity dependent range of optimal length fractions corresponds to the range observed in different fly species and is not affected by transduction unit saturation. We conclude that a limited resource, rhabdom length, can be divided between two polarization sensors, R7 and R8, to optimize opponent coding. We also find that coding ability increases sub-linearly with total rhabdom length, according to the law of diminishing returns. Consequently, the specialized shorter central rhabdom in the DRA codes polarization twice as efficiently with respect to rhabdom length than the longer rhabdom used in the rest of the eye. PMID:28316880
Proceeding On : Parallelisation Of Critical Code Passages In PHOENIX/3D
NASA Astrophysics Data System (ADS)
Arkenberg, Mario; Wichert, Viktoria; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach here, by introducing especially adapted, parallel numerical methods and correspondingly parallelising time critical code passages. In the following, we present our work on PHOENIX/3D.While parallelisation is generally worthwhile, it requires revision of time-consuming subroutines with respect to separability of localised data and variables in order to determine the optimal approach. Of course, the same applies to the code structure. The importance of this ongoing work can be showcased by recently derived benchmark results, which were generated utilis- ing MPI and OpenMP. Furthermore, the need for a careful and thorough choice of an adequate, machine dependent setup is discussed.
System, methods and apparatus for program optimization for multi-threaded processor architectures
Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E
2015-01-06
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Optimization of lightweight structure and supporting bipod flexure for a space mirror.
Chen, Yi-Cheng; Huang, Bo-Kai; You, Zhen-Ting; Chan, Chia-Yen; Huang, Ting-Ming
2016-12-20
This article presents an optimization process for integrated optomechanical design. The proposed optimization process for integrated optomechanical design comprises computer-aided drafting, finite element analysis (FEA), optomechanical transfer codes, and an optimization solver. The FEA was conducted to determine mirror surface deformation; then, deformed surface nodal data were transferred into Zernike polynomials through MATLAB optomechanical transfer codes to calculate the resulting optical path difference (OPD) and optical aberrations. To achieve an optimum design, the optimization iterations of the FEA, optomechanical transfer codes, and optimization solver were automatically connected through a self-developed Tcl script. Two examples of optimization design were illustrated in this research, namely, an optimum lightweight design of a Zerodur primary mirror with an outer diameter of 566 mm that is used in a spaceborne telescope and an optimum bipod flexure design that supports the optimum lightweight primary mirror. Finally, optimum designs were successfully accomplished in both examples, achieving a minimum peak-to-valley (PV) value for the OPD of the deformed optical surface. The simulated optimization results showed that (1) the lightweight ratio of the primary mirror increased from 56% to 66%; and (2) the PV value of the mirror supported by optimum bipod flexures in the horizontal position effectively decreased from 228 to 61 nm.
The Sizing and Optimization Language, (SOL): Computer language for design problems
NASA Technical Reports Server (NTRS)
Lucas, Stephen H.; Scotti, Stephen J.
1988-01-01
The Sizing and Optimization Language, (SOL), a new high level, special purpose computer language was developed to expedite application of numerical optimization to design problems and to make the process less error prone. SOL utilizes the ADS optimization software and provides a clear, concise syntax for describing an optimization problem, the OPTIMIZE description, which closely parallels the mathematical description of the problem. SOL offers language statements which can be used to model a design mathematically, with subroutines or code logic, and with existing FORTRAN routines. In addition, SOL provides error checking and clear output of the optimization results. Because of these language features, SOL is best suited to model and optimize a design concept when the model consits of mathematical expressions written in SOL. For such cases, SOL's unique syntax and error checking can be fully utilized. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler, runtime library routines, and a SOL reference manual.
NASA Technical Reports Server (NTRS)
Baecher, Juergen; Bandte, Oliver; DeLaurentis, Dan; Lewis, Kemper; Sicilia, Jose; Soboleski, Craig
1995-01-01
This report documents the efforts of a Georgia Tech High Speed Civil Transport (HSCT) aerospace student design team in completing a design methodology demonstration under NASA's Advanced Design Program (ADP). Aerodynamic and propulsion analyses are integrated into the synthesis code FLOPS in order to improve its prediction accuracy. Executing the integrated product and process development (IPPD) methodology proposed at the Aerospace Systems Design Laboratory (ASDL), an improved sizing process is described followed by a combined aero-propulsion optimization, where the objective function, average yield per revenue passenger mile ($/RPM), is constrained by flight stability, noise, approach speed, and field length restrictions. Primary goals include successful demonstration of the application of the response surface methodolgy (RSM) to parameter design, introduction to higher fidelity disciplinary analysis than normally feasible at the conceptual and early preliminary level, and investigations of relationships between aerodynamic and propulsion design parameters and their effect on the objective function, $/RPM. A unique approach to aircraft synthesis is developed in which statistical methods, specifically design of experiments and the RSM, are used to more efficiently search the design space for optimum configurations. In particular, two uses of these techniques are demonstrated. First, response model equations are formed which represent complex analysis in the form of a regression polynomial. Next, a second regression equation is constructed, not for modeling purposes, but instead for the purpose of optimization at the system level. Such an optimization problem with the given tools normally would be difficult due to the need for hard connections between the various complex codes involved. The statistical methodology presents an alternative and is demonstrated via an example of aerodynamic modeling and planform optimization for a HSCT.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prather, Michael
This proposal seeks to maintain the DOE-ACME (offshoot of CESM) as one of the leading CCMs to evaluate near-term climate mitigation. It will implement, test, and optimize the new UCI photolysis codes within CESM CAM5 and new CAM versions in ACME. Fast-J is a high-order-accuracy (8 stream) code for calculating solar scattering and absorption in a single column atmosphere containing clouds, aerosols, and gases that was developed at UCI and implemented in CAM5 under the previous BER/SciDAC grant.
Wireless Visual Sensor Network Resource Allocation using Cross-Layer Optimization
2009-01-01
Rate Compatible Punctured Convolutional (RCPC) codes for channel...vol. 44, pp. 2943–2959, November 1998. [22] J. Hagenauer, “ Rate - compatible punctured convolutional codes (RCPC codes ) and their applications,” IEEE... coding rate for H.264/AVC video compression is determined. At the data link layer, the Rate - Compatible Puctured Convolutional (RCPC) channel coding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellano, T.; De Palma, L.; Laneve, D.
2015-07-01
A homemade computer code for designing a Side- Coupled Linear Accelerator (SCL) is written. It integrates a simplified model of SCL tanks with the Particle Swarm Optimization (PSO) algorithm. The computer code main aim is to obtain useful guidelines for the design of Linear Accelerator (LINAC) resonant cavities. The design procedure, assisted via the aforesaid approach seems very promising, allowing future improvements towards the optimization of actual accelerating geometries. (authors)
FRANOPP: Framework for analysis and optimization problems user's guide
NASA Technical Reports Server (NTRS)
Riley, K. M.
1981-01-01
Framework for analysis and optimization problems (FRANOPP) is a software aid for the study and solution of design (optimization) problems which provides the driving program and plotting capability for a user generated programming system. In addition to FRANOPP, the programming system also contains the optimization code CONMIN, and two user supplied codes, one for analysis and one for output. With FRANOPP the user is provided with five options for studying a design problem. Three of the options utilize the plot capability and present an indepth study of the design problem. The study can be focused on a history of the optimization process or on the interaction of variables within the design problem.
Advanced techniques and technology for efficient data storage, access, and transfer
NASA Technical Reports Server (NTRS)
Rice, Robert F.; Miller, Warner
1991-01-01
Advanced techniques for efficiently representing most forms of data are being implemented in practical hardware and software form through the joint efforts of three NASA centers. These techniques adapt to local statistical variations to continually provide near optimum code efficiency when representing data without error. Demonstrated in several earlier space applications, these techniques are the basis of initial NASA data compression standards specifications. Since the techniques clearly apply to most NASA science data, NASA invested in the development of both hardware and software implementations for general use. This investment includes high-speed single-chip very large scale integration (VLSI) coding and decoding modules as well as machine-transferrable software routines. The hardware chips were tested in the laboratory at data rates as high as 700 Mbits/s. A coding module's definition includes a predictive preprocessing stage and a powerful adaptive coding stage. The function of the preprocessor is to optimally process incoming data into a standard form data source that the second stage can handle.The built-in preprocessor of the VLSI coder chips is ideal for high-speed sampled data applications such as imaging and high-quality audio, but additionally, the second stage adaptive coder can be used separately with any source that can be externally preprocessed into the 'standard form'. This generic functionality assures that the applicability of these techniques and their recent high-speed implementations should be equally broad outside of NASA.
Relay selection in energy harvesting cooperative networks with rateless codes
NASA Astrophysics Data System (ADS)
Zhu, Kaiyan; Wang, Fei
2018-04-01
This paper investigates the relay selection in energy harvesting cooperative networks, where the relays harvests energy from the radio frequency (RF) signals transmitted by a source, and the optimal relay is selected and uses the harvested energy to assist the information transmission from the source to its destination. Both source and the selected relay transmit information using rateless code, which allows the destination recover original information after collecting codes bits marginally surpass the entropy of original information. In order to improve transmission performance and efficiently utilize the harvested power, the optimal relay is selected. The optimization problem are formulated to maximize the achievable information rates of the system. Simulation results demonstrate that our proposed relay selection scheme outperform other strategies.
Optimizing the Galileo space communication link
NASA Technical Reports Server (NTRS)
Statman, J. I.
1994-01-01
The Galileo mission was originally designed to investigate Jupiter and its moons utilizing a high-rate, X-band (8415 MHz) communication downlink with a maximum rate of 134.4 kb/sec. However, following the failure of the high-gain antenna (HGA) to fully deploy, a completely new communication link design was established that is based on Galileo's S-band (2295 MHz), low-gain antenna (LGA). The new link relies on data compression, local and intercontinental arraying of antennas, a (14,1/4) convolutional code, a (255,M) variable-redundancy Reed-Solomon code, decoding feedback, and techniques to reprocess recorded data to greatly reduce data losses during signal acquisition. The combination of these techniques will enable return of significant science data from the mission.
Image coding using entropy-constrained residual vector quantization
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Smith, Mark J. T.; Barnes, Christopher F.
1993-01-01
The residual vector quantization (RVQ) structure is exploited to produce a variable length codeword RVQ. Necessary conditions for the optimality of this RVQ are presented, and a new entropy-constrained RVQ (ECRVQ) design algorithm is shown to be very effective in designing RVQ codebooks over a wide range of bit rates and vector sizes. The new EC-RVQ has several important advantages. It can outperform entropy-constrained VQ (ECVQ) in terms of peak signal-to-noise ratio (PSNR), memory, and computation requirements. It can also be used to design high rate codebooks and codebooks with relatively large vector sizes. Experimental results indicate that when the new EC-RVQ is applied to image coding, very high quality is achieved at relatively low bit rates.
Maximum-likelihood soft-decision decoding of block codes using the A* algorithm
NASA Technical Reports Server (NTRS)
Ekroot, L.; Dolinar, S.
1994-01-01
The A* algorithm finds the path in a finite depth binary tree that optimizes a function. Here, it is applied to maximum-likelihood soft-decision decoding of block codes where the function optimized over the codewords is the likelihood function of the received sequence given each codeword. The algorithm considers codewords one bit at a time, making use of the most reliable received symbols first and pursuing only the partially expanded codewords that might be maximally likely. A version of the A* algorithm for maximum-likelihood decoding of block codes has been implemented for block codes up to 64 bits in length. The efficiency of this algorithm makes simulations of codes up to length 64 feasible. This article details the implementation currently in use, compares the decoding complexity with that of exhaustive search and Viterbi decoding algorithms, and presents performance curves obtained with this implementation of the A* algorithm for several codes.
MC ray-tracing optimization of lobster-eye focusing devices with RESTRAX
NASA Astrophysics Data System (ADS)
Šaroun, Jan; Kulda, Jiří
2006-11-01
The enhanced functionalities of the latest version of the RESTRAX software, providing a high-speed Monte Carlo (MC) ray-tracing code to represent a virtual three-axis neutron spectrometer, include representation of parabolic and elliptic guide profiles and facilities for numerical optimization of parameter values, characterizing the instrument components. As examples, we present simulations of a doubly focusing monochromator in combination with cold neutron guides and lobster-eye supermirror devices, concentrating a monochromatic beam to small sample volumes. A Levenberg-Marquardt minimization algorithm is used to optimize simultaneously several parameters of the monochromator and lobster-eye guides. We compare the performance of optimized configurations in terms of monochromatic neutron flux and energy spread and demonstrate the effect of lobster-eye optics on beam transformations in real and momentum subspaces.
Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao
2014-01-01
An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.
Data Sciences Summer Institute Topology Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Watts, Seth
DSSI_TOPOPT is a 2D topology optimization code that designs stiff structures made of a single linear elastic material and void space. The code generates a finite element mesh of a rectangular design domain on which the user specifies displacement and load boundary conditions. The code iteratively designs a structure that minimizes the compliance (maximizes the stiffness) of the structure under the given loading, subject to an upper bound on the amount of material used. Depending on user options, the code can evaluate the performance of a user-designed structure, or create a design from scratch. Output includes the finite element mesh,more » design, and visualizations of the design.« less
Cooperative optimization and their application in LDPC codes
NASA Astrophysics Data System (ADS)
Chen, Ke; Rong, Jian; Zhong, Xiaochun
2008-10-01
Cooperative optimization is a new way for finding global optima of complicated functions of many variables. The proposed algorithm is a class of message passing algorithms and has solid theory foundations. It can achieve good coding gains over the sum-product algorithm for LDPC codes. For (6561, 4096) LDPC codes, the proposed algorithm can achieve 2.0 dB gains over the sum-product algorithm at BER of 4×10-7. The decoding complexity of the proposed algorithm is lower than the sum-product algorithm can do; furthermore, the former can achieve much lower error floor than the latter can do after the Eb / No is higher than 1.8 dB.
Fast H.264/AVC FRExt intra coding using belief propagation.
Milani, Simone
2011-01-01
In the H.264/AVC FRExt coder, the coding performance of Intra coding significantly overcomes the previous still image coding standards, like JPEG2000, thanks to a massive use of spatial prediction. Unfortunately, the adoption of an extensive set of predictors induces a significant increase of the computational complexity required by the rate-distortion optimization routine. The paper presents a complexity reduction strategy that aims at reducing the computational load of the Intra coding with a small loss in the compression performance. The proposed algorithm relies on selecting a reduced set of prediction modes according to their probabilities, which are estimated adopting a belief-propagation procedure. Experimental results show that the proposed method permits saving up to 60 % of the coding time required by an exhaustive rate-distortion optimization method with a negligible loss in performance. Moreover, it permits an accurate control of the computational complexity unlike other methods where the computational complexity depends upon the coded sequence.
Haylen, Bernard T; Lee, Joseph; Maher, Chris; Deprest, Jan; Freeman, Robert
2014-06-01
Results of interobserver reliability studies for the International Urogynecological Association-International Continence Society (IUGA-ICS) Complication Classification coding can be greatly influenced by study design factors such as participant instruction, motivation, and test-question clarity. We attempted to optimize these factors. After a 15-min instructional lecture with eight clinical case examples (including images) and with classification/coding charts available, those clinicians attending an IUGA Surgical Complications workshop were presented with eight similar-style test cases over 10 min and asked to code them using the Category, Time and Site classification. Answers were compared to predetermined correct codes obtained by five instigators of the IUGA-ICS prostheses and grafts complications classification. Prelecture and postquiz participant confidence levels using a five-step Likert scale were assessed. Complete sets of answers to the questions (24 codings) were provided by 34 respondents, only three of whom reported prior use of the charts. Average score [n (%)] out of eight, as well as median score (range) for each coding category were: (i) Category: 7.3 (91 %); 7 (4-8); (ii) Time: 7.8 (98 %); 7 (6-8); (iii) Site: 7.2 (90 %); 7 (5-8). Overall, the equivalent calculations (out of 24) were 22.3 (93 %) and 22 (18-24). Mean prelecture confidence was 1.37 (out of 5), rising to 3.85 postquiz. Urogynecologists had the highest correlation with correct coding, followed closely by fellows and general gynecologists. Optimizing training and study design can lead to excellent results for interobserver reliability of the IUGA-ICS Complication Classification coding, with increased participant confidence in complication-coding ability.
SOL - SIZING AND OPTIMIZATION LANGUAGE COMPILER
NASA Technical Reports Server (NTRS)
Scotti, S. J.
1994-01-01
SOL is a computer language which is geared to solving design problems. SOL includes the mathematical modeling and logical capabilities of a computer language like FORTRAN but also includes the additional power of non-linear mathematical programming methods (i.e. numerical optimization) at the language level (as opposed to the subroutine level). The language-level use of optimization has several advantages over the traditional, subroutine-calling method of using an optimizer: first, the optimization problem is described in a concise and clear manner which closely parallels the mathematical description of optimization; second, a seamless interface is automatically established between the optimizer subroutines and the mathematical model of the system being optimized; third, the results of an optimization (objective, design variables, constraints, termination criteria, and some or all of the optimization history) are output in a form directly related to the optimization description; and finally, automatic error checking and recovery from an ill-defined system model or optimization description is facilitated by the language-level specification of the optimization problem. Thus, SOL enables rapid generation of models and solutions for optimum design problems with greater confidence that the problem is posed correctly. The SOL compiler takes SOL-language statements and generates the equivalent FORTRAN code and system calls. Because of this approach, the modeling capabilities of SOL are extended by the ability to incorporate existing FORTRAN code into a SOL program. In addition, SOL has a powerful MACRO capability. The MACRO capability of the SOL compiler effectively gives the user the ability to extend the SOL language and can be used to develop easy-to-use shorthand methods of generating complex models and solution strategies. The SOL compiler provides syntactic and semantic error-checking, error recovery, and detailed reports containing cross-references to show where each variable was used. The listings summarize all optimizations, listing the objective functions, design variables, and constraints. The compiler offers error-checking specific to optimization problems, so that simple mistakes will not cost hours of debugging time. The optimization engine used by and included with the SOL compiler is a version of Vanderplatt's ADS system (Version 1.1) modified specifically to work with the SOL compiler. SOL allows the use of the over 100 ADS optimization choices such as Sequential Quadratic Programming, Modified Feasible Directions, interior and exterior penalty function and variable metric methods. Default choices of the many control parameters of ADS are made for the user, however, the user can override any of the ADS control parameters desired for each individual optimization. The SOL language and compiler were developed with an advanced compiler-generation system to ensure correctness and simplify program maintenance. Thus, SOL's syntax was defined precisely by a LALR(1) grammar and the SOL compiler's parser was generated automatically from the LALR(1) grammar with a parser-generator. Hence unlike ad hoc, manually coded interfaces, the SOL compiler's lexical analysis insures that the SOL compiler recognizes all legal SOL programs, can recover from and correct for many errors and report the location of errors to the user. This version of the SOL compiler has been implemented on VAX/VMS computer systems and requires 204 KB of virtual memory to execute. Since the SOL compiler produces FORTRAN code, it requires the VAX FORTRAN compiler to produce an executable program. The SOL compiler consists of 13,000 lines of Pascal code. It was developed in 1986 and last updated in 1988. The ADS and other utility subroutines amount to 14,000 lines of FORTRAN code and were also updated in 1988.
Optimization of the MINERVA Exoplanet Search Strategy via Simulations
NASA Astrophysics Data System (ADS)
Nava, Chantell; Johnson, Samson; McCrady, Nate; Minerva
2015-01-01
Detection of low-mass exoplanets requires high spectroscopic precision and high observational cadence. MINERVA is a dedicated observatory capable of sub meter-per-second radial velocity precision. As a dedicated observatory, MINERVA can observe with every-clear-night cadence that is essential for low-mass exoplanet detection. However, this cadence complicates the determination of an optimal observing strategy. We simulate MINERVA observations to optimize our observing strategy and maximize exoplanet detections. A dispatch scheduling algorithm provides observations of MINERVA targets every day over a three-year observing campaign. An exoplanet population with a distribution informed by Kepler statistics is assigned to the targets, and radial velocity curves induced by the planets are constructed. We apply a correlated noise model that realistically simulates stellar astrophysical noise sources. The simulated radial velocity data is fed to the MINERVA planet detection code and the expected exoplanet yield is calculated. The full simulation provides a tool to test different strategies for scheduling observations of our targets and optimizing the MINERVA exoplanet search strategy.
ODECS -- A computer code for the optimal design of S.I. engine control strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arsie, I.; Pianese, C.; Rizzo, G.
1996-09-01
The computer code ODECS (Optimal Design of Engine Control Strategies) for the design of Spark Ignition engine control strategies is presented. This code has been developed starting from the author`s activity in this field, availing of some original contributions about engine stochastic optimization and dynamical models. This code has a modular structure and is composed of a user interface for the definition, the execution and the analysis of different computations performed with 4 independent modules. These modules allow the following calculations: (1) definition of the engine mathematical model from steady-state experimental data; (2) engine cycle test trajectory corresponding to amore » vehicle transient simulation test such as ECE15 or FTP drive test schedule; (3) evaluation of the optimal engine control maps with a steady-state approach; (4) engine dynamic cycle simulation and optimization of static control maps and/or dynamic compensation strategies, taking into account dynamical effects due to the unsteady fluxes of air and fuel and the influences of combustion chamber wall thermal inertia on fuel consumption and emissions. Moreover, in the last two modules it is possible to account for errors generated by a non-deterministic behavior of sensors and actuators and the related influences on global engine performances, and compute robust strategies, less sensitive to stochastic effects. In the paper the four models are described together with significant results corresponding to the simulation and the calculation of optimal control strategies for dynamic transient tests.« less
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Hybrid services efficient provisioning over the network coding-enabled elastic optical networks
NASA Astrophysics Data System (ADS)
Wang, Xin; Gu, Rentao; Ji, Yuefeng; Kavehrad, Mohsen
2017-03-01
As a variety of services have emerged, hybrid services have become more common in real optical networks. Although the elastic spectrum resource optimizations over the elastic optical networks (EONs) have been widely investigated, little research has been carried out on the hybrid services of the routing and spectrum allocation (RSA), especially over the network coding-enabled EON. We investigated the RSA for the unicast service and network coding-based multicast service over the network coding-enabled EON with the constraints of time delay and transmission distance. To address this issue, a mathematical model was built to minimize the total spectrum consumption for the hybrid services over the network coding-enabled EON under the constraints of time delay and transmission distance. The model guarantees different routing constraints for different types of services. The immediate nodes over the network coding-enabled EON are assumed to be capable of encoding the flows for different kinds of information. We proposed an efficient heuristic algorithm of the network coding-based adaptive routing and layered graph-based spectrum allocation algorithm (NCAR-LGSA). From the simulation results, NCAR-LGSA shows highly efficient performances in terms of the spectrum resources utilization under different network scenarios compared with the benchmark algorithms.
NASA Astrophysics Data System (ADS)
Liu, Maw-Yang; Hsu, Yi-Kai
2017-03-01
Three-arm dual-balanced detection scheme is studied in an optical code division multiple access system. As the MAI and beat noise are the main deleterious source of system performance, we utilize optical hard-limiters to alleviate such channel impairment. In addition, once the channel condition is improved effectively, the proposed two-dimensional error correction code can remarkably enhance the system performance. In our proposed scheme, the optimal thresholds of optical hard-limiters and decision circuitry are fixed, and they will not change with other system parameters. Our proposed scheme can accommodate a large number of users simultaneously and is suitable for burst traffic with asynchronous transmission. Therefore, it is highly recommended as the platform for broadband optical access network.
CTF (Subchannel) Calculations and Validation L3:VVI.H2L.P15.01
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Natalie
The goal of the Verification and Validation Implementation (VVI) High to Low (Hi2Lo) process is utilizing a validated model in a high resolution code to generate synthetic data for improvement of the same model in a lower resolution code. This process is useful in circumstances where experimental data does not exist or it is not sufficient in quantity or resolution. Data from the high-fidelity code is treated as calibration data (with appropriate uncertainties and error bounds) which can be used to train parameters that affect solution accuracy in the lower-fidelity code model, thereby reducing uncertainty. This milestone presents a demonstrationmore » of the Hi2Lo process derived in the VVI focus area. The majority of the work performed herein describes the steps of the low-fidelity code used in the process with references to the work detailed in the companion high-fidelity code milestone (Reference 1). The CASL low-fidelity code used to perform this work was Cobra Thermal Fluid (CTF) and the high-fidelity code was STAR-CCM+ (STAR). The master branch version of CTF (pulled May 5, 2017 – Reference 2) was utilized for all CTF analyses performed as part of this milestone. The statistical and VVUQ components of the Hi2Lo framework were performed using Dakota version 6.6 (release date May 15, 2017 – Reference 3). Experimental data from Westinghouse Electric Company (WEC – Reference 4) was used throughout the demonstrated process to compare with the high-fidelity STAR results. A CTF parameter called Beta was chosen as the calibration parameter for this work. By default, Beta is defined as a constant mixing coefficient in CTF and is essentially a tuning parameter for mixing between subchannels. Since CTF does not have turbulence models like STAR, Beta is the parameter that performs the most similar function to the turbulence models in STAR. The purpose of the work performed in this milestone is to tune Beta to an optimal value that brings the CTF results closer to those measured in the WEC experiments.« less
Quantum Monte Carlo Endstation for Petascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lubos Mitas
2011-01-26
NCSU research group has been focused on accomplising the key goals of this initiative: establishing new generation of quantum Monte Carlo (QMC) computational tools as a part of Endstation petaflop initiative for use at the DOE ORNL computational facilities and for use by computational electronic structure community at large; carrying out high accuracy quantum Monte Carlo demonstration projects in application of these tools to the forefront electronic structure problems in molecular and solid systems; expanding the impact of QMC methods and approaches; explaining and enhancing the impact of these advanced computational approaches. In particular, we have developed quantum Monte Carlomore » code (QWalk, www.qwalk.org) which was significantly expanded and optimized using funds from this support and at present became an actively used tool in the petascale regime by ORNL researchers and beyond. These developments have been built upon efforts undertaken by the PI's group and collaborators over the period of the last decade. The code was optimized and tested extensively on a number of parallel architectures including petaflop ORNL Jaguar machine. We have developed and redesigned a number of code modules such as evaluation of wave functions and orbitals, calculations of pfaffians and introduction of backflow coordinates together with overall organization of the code and random walker distribution over multicore architectures. We have addressed several bottlenecks such as load balancing and verified efficiency and accuracy of the calculations with the other groups of the Endstation team. The QWalk package contains about 50,000 lines of high quality object-oriented C++ and includes also interfaces to data files from other conventional electronic structure codes such as Gamess, Gaussian, Crystal and others. This grant supported PI for one month during summers, a full-time postdoc and partially three graduate students over the period of the grant duration, it has resulted in 13 published papers, 15 invited talks and lectures nationally and internationally. My former graduate student and postdoc Dr. Michal Bajdich, who was supported byt this grant, is currently a postdoc with ORNL in the group of Dr. F. Reboredo and Dr. P. Kent and is using the developed tools in a number of DOE projects. The QWalk package has become a truly important research tool used by the electronic structure community and has attracted several new developers in other research groups. Our tools use several types of correlated wavefunction approaches, variational, diffusion and reptation methods, large-scale optimization methods for wavefunctions and enables to calculate energy differences such as cohesion, electronic gaps, but also densities and other properties, using multiple runs one can obtain equations of state for given structures and beyond. Our codes use efficient numerical and Monte Carlo strategies (high accuracy numerical orbitals, multi-reference wave functions, highly accurate correlation factors, pairing orbitals, force biased and correlated sampling Monte Carlo), are robustly parallelized and enable to run on tens of thousands cores very efficiently. Our demonstration applications were focused on the challenging research problems in several fields of materials science such as transition metal solids. We note that our study of FeO solid was the first QMC calculation of transition metal oxides at high pressures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakhai, B.
A new method for solving radiation transport problems is presented. The heart of the technique is a new cross section processing procedure for the calculation of group-to-point and point-to-group cross sections sets. The method is ideally suited for problems which involve media with highly fluctuating cross sections, where the results of the traditional multigroup calculations are beclouded by the group averaging procedures employed. Extensive computational efforts, which would be required to evaluate double integrals in the multigroup treatment numerically, prohibit iteration to optimize the energy boundaries. On the other hand, use of point-to-point techniques (as in the stochastic technique) ismore » often prohibitively expensive due to the large computer storage requirement. The pseudo-point code is a hybrid of the two aforementioned methods (group-to-group and point-to-point) - hence the name pseudo-point - that reduces the computational efforts of the former and the large core requirements of the latter. The pseudo-point code generates the group-to-point or the point-to-group transfer matrices, and can be coupled with the existing transport codes to calculate pointwise energy-dependent fluxes. This approach yields much more detail than is available from the conventional energy-group treatments. Due to the speed of this code, several iterations could be performed (in affordable computing efforts) to optimize the energy boundaries and the weighting functions. The pseudo-point technique is demonstrated by solving six problems, each depicting a certain aspect of the technique. The results are presented as flux vs energy at various spatial intervals. The sensitivity of the technique to the energy grid and the savings in computational effort are clearly demonstrated.« less
Automating the generation of finite element dynamical cores with Firedrake
NASA Astrophysics Data System (ADS)
Ham, David; Mitchell, Lawrence; Homolya, Miklós; Luporini, Fabio; Gibson, Thomas; Kelly, Paul; Cotter, Colin; Lange, Michael; Kramer, Stephan; Shipton, Jemma; Yamazaki, Hiroe; Paganini, Alberto; Kärnä, Tuomas
2017-04-01
The development of a dynamical core is an increasingly complex software engineering undertaking. As the equations become more complete, the discretisations more sophisticated and the hardware acquires ever more fine-grained parallelism and deeper memory hierarchies, the problem of building, testing and modifying dynamical cores becomes increasingly complex. Here we present Firedrake, a code generation system for the finite element method with specialist features designed to support the creation of geoscientific models. Using Firedrake, the dynamical core developer writes the partial differential equations in weak form in a high level mathematical notation. Appropriate function spaces are chosen and time stepping loops written at the same high level. When the programme is run, Firedrake generates high performance C code for the resulting numerics which are executed in parallel. Models in Firedrake typically take a tiny fraction of the lines of code required by traditional hand-coding techniques. They support more sophisticated numerics than are easily achieved by hand, and the resulting code is frequently higher performance. Critically, debugging, modifying and extending a model written in Firedrake is vastly easier than by traditional methods due to the small, highly mathematical code base. Firedrake supports a wide range of key features for dynamical core creation: A vast range of discretisations, including both continuous and discontinuous spaces and mimetic (C-grid-like) elements which optimally represent force balances in geophysical flows. High aspect ratio layered meshes suitable for ocean and atmosphere domains. Curved elements for high accuracy representations of the sphere. Support for non-finite element operators, such as parametrisations. Access to PETSc, a world-leading library of programmable linear and nonlinear solvers. High performance adjoint models generated automatically by symbolically reasoning about the forward model. This poster will present the key features of the Firedrake system, as well as those of Gusto, an atmospheric dynamical core, and Thetis, a coastal ocean model, both of which are written in Firedrake.
Circular codes revisited: a statistical approach.
Gonzalez, D L; Giannerini, S; Rosa, R
2011-04-21
In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C(3) maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization. Copyright © 2011 Elsevier Ltd. All rights reserved.
Towards a high performance geometry library for particle-detector simulations
Apostolakis, J.; Bandieramonte, M.; Bitzes, G.; ...
2015-05-22
Thread-parallelization and single-instruction multiple data (SIMD) ”vectorisation” of software components in HEP computing has become a necessity to fully benefit from current and future computing hardware. In this context, the Geant-Vector/GPU simulation project aims to re-engineer current software for the simulation of the passage of particles through detectors in order to increase the overall event throughput. As one of the core modules in this area, the geometry library plays a central role and vectorising its algorithms will be one of the cornerstones towards achieving good CPU performance. Here, we report on the progress made in vectorising the shape primitives, asmore » well as in applying new C++ template based optimizations of existing code available in the Geant4, ROOT or USolids geometry libraries. We will focus on a presentation of our software development approach that aims to provide optimized code for all use cases of the library (e.g., single particle and many-particle APIs) and to support different architectures (CPU and GPU) while keeping the code base small, manageable and maintainable. We report on a generic and templated C++ geometry library as a continuation of the AIDA USolids project. As a result, the experience gained with these developments will be beneficial to other parts of the simulation software, such as for the optimization of the physics library, and possibly to other parts of the experiment software stack, such as reconstruction and analysis.« less
Towards a high performance geometry library for particle-detector simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Apostolakis, J.; Bandieramonte, M.; Bitzes, G.
Thread-parallelization and single-instruction multiple data (SIMD) ”vectorisation” of software components in HEP computing has become a necessity to fully benefit from current and future computing hardware. In this context, the Geant-Vector/GPU simulation project aims to re-engineer current software for the simulation of the passage of particles through detectors in order to increase the overall event throughput. As one of the core modules in this area, the geometry library plays a central role and vectorising its algorithms will be one of the cornerstones towards achieving good CPU performance. Here, we report on the progress made in vectorising the shape primitives, asmore » well as in applying new C++ template based optimizations of existing code available in the Geant4, ROOT or USolids geometry libraries. We will focus on a presentation of our software development approach that aims to provide optimized code for all use cases of the library (e.g., single particle and many-particle APIs) and to support different architectures (CPU and GPU) while keeping the code base small, manageable and maintainable. We report on a generic and templated C++ geometry library as a continuation of the AIDA USolids project. As a result, the experience gained with these developments will be beneficial to other parts of the simulation software, such as for the optimization of the physics library, and possibly to other parts of the experiment software stack, such as reconstruction and analysis.« less
Optimal design of composite hip implants using NASA technology
NASA Technical Reports Server (NTRS)
Blake, T. A.; Saravanos, D. A.; Davy, D. T.; Waters, S. A.; Hopkins, D. A.
1993-01-01
Using an adaptation of NASA software, we have investigated the use of numerical optimization techniques for the shape and material optimization of fiber composite hip implants. The original NASA inhouse codes, were originally developed for the optimization of aerospace structures. The adapted code, which was called OPORIM, couples numerical optimization algorithms with finite element analysis and composite laminate theory to perform design optimization using both shape and material design variables. The external and internal geometry of the implant and the surrounding bone is described with quintic spline curves. This geometric representation is then used to create an equivalent 2-D finite element model of the structure. Using laminate theory and the 3-D geometric information, equivalent stiffnesses are generated for each element of the 2-D finite element model, so that the 3-D stiffness of the structure can be approximated. The geometric information to construct the model of the femur was obtained from a CT scan. A variety of test cases were examined, incorporating several implant constructions and design variable sets. Typically the code was able to produce optimized shape and/or material parameters which substantially reduced stress concentrations in the bone adjacent of the implant. The results indicate that this technology can provide meaningful insight into the design of fiber composite hip implants.
On entanglement-assisted quantum codes achieving the entanglement-assisted Griesmer bound
NASA Astrophysics Data System (ADS)
Li, Ruihu; Li, Xueliang; Guo, Luobin
2015-12-01
The theory of entanglement-assisted quantum error-correcting codes (EAQECCs) is a generalization of the standard stabilizer formalism. Any quaternary (or binary) linear code can be used to construct EAQECCs under the entanglement-assisted (EA) formalism. We derive an EA-Griesmer bound for linear EAQECCs, which is a quantum analog of the Griesmer bound for classical codes. This EA-Griesmer bound is tighter than known bounds for EAQECCs in the literature. For a given quaternary linear code {C}, we show that the parameters of the EAQECC that EA-stabilized by the dual of {C} can be determined by a zero radical quaternary code induced from {C}, and a necessary condition under which a linear EAQECC may achieve the EA-Griesmer bound is also presented. We construct four families of optimal EAQECCs and then show the necessary condition for existence of EAQECCs is also sufficient for some low-dimensional linear EAQECCs. The four families of optimal EAQECCs are degenerate codes and go beyond earlier constructions. What is more, except four codes, our [[n,k,d_{ea};c
Advanced GF(32) nonbinary LDPC coded modulation with non-uniform 9-QAM outperforming star 8-QAM.
Liu, Tao; Lin, Changyu; Djordjevic, Ivan B
2016-06-27
In this paper, we first describe a 9-symbol non-uniform signaling scheme based on Huffman code, in which different symbols are transmitted with different probabilities. By using the Huffman procedure, prefix code is designed to approach the optimal performance. Then, we introduce an algorithm to determine the optimal signal constellation sets for our proposed non-uniform scheme with the criterion of maximizing constellation figure of merit (CFM). The proposed nonuniform polarization multiplexed signaling 9-QAM scheme has the same spectral efficiency as the conventional 8-QAM. Additionally, we propose a specially designed GF(32) nonbinary quasi-cyclic LDPC code for the coded modulation system based on the 9-QAM non-uniform scheme. Further, we study the efficiency of our proposed non-uniform 9-QAM, combined with nonbinary LDPC coding, and demonstrate by Monte Carlo simulation that the proposed GF(23) nonbinary LDPC coded 9-QAM scheme outperforms nonbinary LDPC coded uniform 8-QAM by at least 0.8dB.
Tang, Shiming; Zhang, Yimeng; Li, Zhihao; Li, Ming; Liu, Fang; Jiang, Hongfei; Lee, Tai Sing
2018-04-26
One general principle of sensory information processing is that the brain must optimize efficiency by reducing the number of neurons that process the same information. The sparseness of the sensory representations in a population of neurons reflects the efficiency of the neural code. Here, we employ large-scale two-photon calcium imaging to examine the responses of a large population of neurons within the superficial layers of area V1 with single-cell resolution, while simultaneously presenting a large set of natural visual stimuli, to provide the first direct measure of the population sparseness in awake primates. The results show that only 0.5% of neurons respond strongly to any given natural image - indicating a ten-fold increase in the inferred sparseness over previous measurements. These population activities are nevertheless necessary and sufficient to discriminate visual stimuli with high accuracy, suggesting that the neural code in the primary visual cortex is both super-sparse and highly efficient. © 2018, Tang et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aldridge, David Franklin; Collier, Sandra L.; Marlin, David H.
2005-05-01
This document is intended to serve as a users guide for the time-domain atmospheric acoustic propagation suite (TDAAPS) program developed as part of the Department of Defense High-Performance Modernization Office (HPCMP) Common High-Performance Computing Scalable Software Initiative (CHSSI). TDAAPS performs staggered-grid finite-difference modeling of the acoustic velocity-pressure system with the incorporation of spatially inhomogeneous winds. Wherever practical the control structure of the codes are written in C++ using an object oriented design. Sections of code where a large number of calculations are required are written in C or F77 in order to enable better compiler optimization of these sections. Themore » TDAAPS program conforms to a UNIX style calling interface. Most of the actions of the codes are controlled by adding flags to the invoking command line. This document presents a large number of examples and provides new users with the necessary background to perform acoustic modeling with TDAAPS.« less
Multidisciplinary optimization of a controlled space structure using 150 design variables
NASA Technical Reports Server (NTRS)
James, Benjamin B.
1993-01-01
A controls-structures interaction design method is presented. The method coordinates standard finite-element structural analysis, multivariable controls, and nonlinear programming codes and allows simultaneous optimization of the structure and control system of a spacecraft. Global sensitivity equations are used to account for coupling between the disciplines. Use of global sensitivity equations helps solve optimization problems that have a large number of design variables and a high degree of coupling between disciplines. The preliminary design of a generic geostationary platform is used to demonstrate the multidisciplinary optimization method. Design problems using 15, 63, and 150 design variables to optimize truss member sizes and feedback gain values are solved and the results are presented. The goal is to reduce the total mass of the structure and the vibration control system while satisfying constraints on vibration decay rate. Incorporation of the nonnegligible mass of actuators causes an essential coupling between structural design variables and control design variables.
Language Recognition via Sparse Coding
2016-09-08
a posteriori (MAP) adaptation scheme that further optimizes the discriminative quality of sparse-coded speech fea - tures. We empirically validate the...significantly improve the discriminative quality of sparse-coded speech fea - tures. In Section 4, we evaluate the proposed approaches against an i-vector
Optimizing Dense Plasma Focus Neutron Yields with Fast Gas Jets
NASA Astrophysics Data System (ADS)
McMahon, Matthew; Kueny, Christopher; Stein, Elizabeth; Link, Anthony; Schmidt, Andrea
2016-10-01
We report a study using the particle-in-cell code LSP to perform fully kinetic simulations modeling dense plasma focus (DPF) devices with high density gas jets on axis. The high density jet models fast gas puffs which allow for more mass on axis while maintaining the optimal pressure for the DPF. As the density of the jet compared to the background fill increases we find the neutron yield increases, as does the variability in the neutron yield. Introducing perturbations in the jet density allow for consistent seeding of the m =0 instability leading to more consistent ion acceleration and higher neutron yields with less variability. Jets with higher on axis density are found to have the greatest yield. The optimal jet configuration is explored. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
NASA Astrophysics Data System (ADS)
Kim, Jeongnim; Baczewski, Andrew D.; Beaudet, Todd D.; Benali, Anouar; Chandler Bennett, M.; Berrill, Mark A.; Blunt, Nick S.; Josué Landinez Borda, Edgar; Casula, Michele; Ceperley, David M.; Chiesa, Simone; Clark, Bryan K.; Clay, Raymond C., III; Delaney, Kris T.; Dewing, Mark; Esler, Kenneth P.; Hao, Hongxia; Heinonen, Olle; Kent, Paul R. C.; Krogel, Jaron T.; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M. Graham; Luo, Ye; Malone, Fionn D.; Martin, Richard M.; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A.; Mitas, Lubos; Morales, Miguel A.; Neuscamman, Eric; Parker, William D.; Pineda Flores, Sergio D.; Romero, Nichols A.; Rubenstein, Brenda M.; Shea, Jacqueline A. R.; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F.; Townsend, Joshua P.; Tubman, Norm M.; Van Der Goetz, Brett; Vincent, Jordan E.; ChangMo Yang, D.; Yang, Yubo; Zhang, Shuai; Zhao, Luning
2018-05-01
QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater–Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
Kim, Jeongnim; Baczewski, Andrew T; Beaudet, Todd D; Benali, Anouar; Bennett, M Chandler; Berrill, Mark A; Blunt, Nick S; Borda, Edgar Josué Landinez; Casula, Michele; Ceperley, David M; Chiesa, Simone; Clark, Bryan K; Clay, Raymond C; Delaney, Kris T; Dewing, Mark; Esler, Kenneth P; Hao, Hongxia; Heinonen, Olle; Kent, Paul R C; Krogel, Jaron T; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M Graham; Luo, Ye; Malone, Fionn D; Martin, Richard M; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A; Mitas, Lubos; Morales, Miguel A; Neuscamman, Eric; Parker, William D; Pineda Flores, Sergio D; Romero, Nichols A; Rubenstein, Brenda M; Shea, Jacqueline A R; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F; Townsend, Joshua P; Tubman, Norm M; Van Der Goetz, Brett; Vincent, Jordan E; Yang, D ChangMo; Yang, Yubo; Zhang, Shuai; Zhao, Luning
2018-05-16
QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program's capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
Flexible and evolutionary optical access networks
NASA Astrophysics Data System (ADS)
Hsueh, Yu-Li
Passive optical networks (PONs) are promising solutions that will open the first-mile bottleneck. Current PONs employ time division multiplexing (TDM) to share bandwidth among users, leading to low cost but limited capacity. In the future, wavelength division multiplexing (WDM) technologies will be deployed to achieve high performance. This dissertation describes several advanced technologies to enhance PON systems. A spectral shaping line coding scheme is developed to allow a simple and cost-effective overlay of high data-rate services in existing PONs, leaving field-deployed fibers and existing services untouched. Spectral shapes of coded signals can be manipulated to adapt to different systems. For a specific tolerable interference level, the optimal line code can be found which maximizes the data throughput. Experiments are conducted to demonstrate and compare several optimized line codes. A novel PON employing dynamic wavelength allocation to provide bandwidth sharing across multiple physical PONs is designed and experimentally demonstrated. Tunable lasers, arrayed waveguide gratings, and coarse/fine filtering combine to create a flexible optical access solution. The network's excellent scalability can bridge the gap between conventional TDM PONs and WDM PONs. Scheduling algorithms with quality of service support are also investigated. Simulation results show that the proposed architecture exhibits significant performance gain over conventional PON systems. Streaming video transmission is demonstrated on the prototype experimental testbed. The powerful architecture is a promising candidate for next-generation optical access networks. A new CDR circuit for receiving the bursty traffic in PONs is designed and analyzed. It detects data transition edges upon arrival of the data burst and quickly selects the best clock phase by a control logic circuit. Then, an analog delay-locked loop (DLL) keeps track of data transitions and removes phase errors throughout the burst. The combination of the fast phase detection mechanism and a feedback loop based on DLL allows both fast response and manageable jitter performance in the burst-mode application. A new efficient numerical algorithm is developed to analyze holey optical fibers. The algorithm has been verified against experimental data, and is exploited to design holey optical fibers optimized for the discrete Raman amplification.
A Fast Proceduere for Optimizing Thermal Protection Systems of Re-Entry Vehicles
NASA Astrophysics Data System (ADS)
Ferraiuolo, M.; Riccio, A.; Tescione, D.; Gigliotti, M.
The aim of the present work is to introduce a fast procedure to optimize thermal protection systems for re-entry vehicles subjected to high thermal loads. A simplified one-dimensional optimization process, performed in order to find the optimum design variables (lengths, sections etc.), is the first step of the proposed design procedure. Simultaneously, the most suitable materials able to sustain high temperatures and meeting the weight requirements are selected and positioned within the design layout. In this stage of the design procedure, simplified (generalized plane strain) FEM models are used when boundary and geometrical conditions allow the reduction of the degrees of freedom. Those simplified local FEM models can be useful because they are time-saving and very simple to build; they are essentially one dimensional and can be used for optimization processes in order to determine the optimum configuration with regard to weight, temperature and stresses. A triple-layer and a double-layer body, subjected to the same aero-thermal loads, have been optimized to minimize the overall weight. Full two and three-dimensional analyses are performed in order to validate those simplified models. Thermal-structural analyses and optimizations are executed by adopting the Ansys FEM code.
Evaluating and minimizing noise impact due to aircraft flyover
NASA Technical Reports Server (NTRS)
Jacobson, I. D.; Cook, G.
1979-01-01
Existing techniques were used to assess the noise impact on a community due to aircraft operation and to optimize the flight paths of an approaching aircraft with respect to the annoyance produced. Major achievements are: (1) the development of a population model suitable for determining the noise impact, (2) generation of a numerical computer code which uses this population model along with the steepest descent algorithm to optimize approach/landing trajectories, (3) implementation of this optimization code in several fictitious cases as well as for the community surrounding Patrick Henry International Airport, Virginia.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Davidsson, Marcus; Diaz-Fernandez, Paula; Schwich, Oliver D.; Torroba, Marcos; Wang, Gang; Björklund, Tomas
2016-01-01
Detailed characterization and mapping of oligonucleotide function in vivo is generally a very time consuming effort that only allows for hypothesis driven subsampling of the full sequence to be analysed. Recent advances in deep sequencing together with highly efficient parallel oligonucleotide synthesis and cloning techniques have, however, opened up for entirely new ways to map genetic function in vivo. Here we present a novel, optimized protocol for the generation of universally applicable, barcode labelled, plasmid libraries. The libraries are designed to enable the production of viral vector preparations assessing coding or non-coding RNA function in vivo. When generating high diversity libraries, it is a challenge to achieve efficient cloning, unambiguous barcoding and detailed characterization using low-cost sequencing technologies. With the presented protocol, diversity of above 3 million uniquely barcoded adeno-associated viral (AAV) plasmids can be achieved in a single reaction through a process achievable in any molecular biology laboratory. This approach opens up for a multitude of in vivo assessments from the evaluation of enhancer and promoter regions to the optimization of genome editing. The generated plasmid libraries are also useful for validation of sequencing clustering algorithms and we here validate the newly presented message passing clustering process named Starcode. PMID:27874090
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ortiz-Rodriguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.
In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called ''Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres'', (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the ''Robust design of artificial neural networks methodology'' and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored atmore » synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of {sup 252}Cf, {sup 241}AmBe and {sup 239}PuBe neutron sources measured with a Bonner spheres system.« less
NASA Astrophysics Data System (ADS)
Ortiz-Rodríguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.; Solís Sánches, L. O.; Miranda, R. Castañeda; Cervantes Viramontes, J. M.; Vega-Carrillo, H. R.
2013-07-01
In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called "Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres", (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the "Robust design of artificial neural networks methodology" and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored at synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of 252Cf, 241AmBe and 239PuBe neutron sources measured with a Bonner spheres system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
MAGEE,GLEN I.
Computers transfer data in a number of different ways. Whether through a serial port, a parallel port, over a modem, over an ethernet cable, or internally from a hard disk to memory, some data will be lost. To compensate for that loss, numerous error detection and correction algorithms have been developed. One of the most common error correction codes is the Reed-Solomon code, which is a special subset of BCH (Bose-Chaudhuri-Hocquenghem) linear cyclic block codes. In the AURA project, an unmanned aircraft sends the data it collects back to earth so it can be analyzed during flight and possible flightmore » modifications made. To counter possible data corruption during transmission, the data is encoded using a multi-block Reed-Solomon implementation with a possibly shortened final block. In order to maximize the amount of data transmitted, it was necessary to reduce the computation time of a Reed-Solomon encoding to three percent of the processor's time. To achieve such a reduction, many code optimization techniques were employed. This paper outlines the steps taken to reduce the processing time of a Reed-Solomon encoding and the insight into modern optimization techniques gained from the experience.« less
Optimal sensor placement for spatial lattice structure based on genetic algorithms
NASA Astrophysics Data System (ADS)
Liu, Wei; Gao, Wei-cheng; Sun, Yi; Xu, Min-jian
2008-10-01
Optimal sensor placement technique plays a key role in structural health monitoring of spatial lattice structures. This paper considers the problem of locating sensors on a spatial lattice structure with the aim of maximizing the data information so that structural dynamic behavior can be fully characterized. Based on the criterion of optimal sensor placement for modal test, an improved genetic algorithm is introduced to find the optimal placement of sensors. The modal strain energy (MSE) and the modal assurance criterion (MAC) have been taken as the fitness function, respectively, so that three placement designs were produced. The decimal two-dimension array coding method instead of binary coding method is proposed to code the solution. Forced mutation operator is introduced when the identical genes appear via the crossover procedure. A computational simulation of a 12-bay plain truss model has been implemented to demonstrate the feasibility of the three optimal algorithms above. The obtained optimal sensor placements using the improved genetic algorithm are compared with those gained by exiting genetic algorithm using the binary coding method. Further the comparison criterion based on the mean square error between the finite element method (FEM) mode shapes and the Guyan expansion mode shapes identified by data-driven stochastic subspace identification (SSI-DATA) method are employed to demonstrate the advantage of the different fitness function. The results showed that some innovations in genetic algorithm proposed in this paper can enlarge the genes storage and improve the convergence of the algorithm. More importantly, the three optimal sensor placement methods can all provide the reliable results and identify the vibration characteristics of the 12-bay plain truss model accurately.
Robust hashing with local models for approximate similarity search.
Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang
2014-07-01
Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.
Cai, Yao; Hu, Huasi; Pan, Ziheng; Hu, Guang; Zhang, Tao
2018-05-17
To optimize the shield for neutrons and gamma rays compact and lightweight, a method combining the structure and components together was established employing genetic algorithms and MCNP code. As a typical case, the fission energy spectrum of 235 U which mixed neutrons and gamma rays was adopted in this study. Six types of materials were presented and optimized by the method. Spherical geometry was adopted in the optimization after checking the geometry effect. Simulations have made to verify the reliability of the optimization method and the efficiency of the optimized materials. To compare the materials visually and conveniently, the volume and weight needed to build a shield are employed. The results showed that, the composite multilayer material has the best performance. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lilienthal, P.
1997-12-01
This paper describes three different computer codes which have been written to model village power applications. The reasons which have driven the development of these codes include: the existance of limited field data; diverse applications can be modeled; models allow cost and performance comparisons; simulations generate insights into cost structures. The models which are discussed are: Hybrid2, a public code which provides detailed engineering simulations to analyze the performance of a particular configuration; HOMER - the hybrid optimization model for electric renewables - which provides economic screening for sensitivity analyses; and VIPOR the village power model - which is amore » network optimization model for comparing mini-grids to individual systems. Examples of the output of these codes are presented for specific applications.« less
Performance and structure of single-mode bosonic codes
NASA Astrophysics Data System (ADS)
Albert, Victor V.; Noh, Kyungjoo; Duivenvoorden, Kasper; Young, Dylan J.; Brierley, R. T.; Reinhold, Philip; Vuillot, Christophe; Li, Linshu; Shen, Chao; Girvin, S. M.; Terhal, Barbara M.; Jiang, Liang
2018-03-01
The early Gottesman, Kitaev, and Preskill (GKP) proposal for encoding a qubit in an oscillator has recently been followed by cat- and binomial-code proposals. Numerically optimized codes have also been proposed, and we introduce codes of this type here. These codes have yet to be compared using the same error model; we provide such a comparison by determining the entanglement fidelity of all codes with respect to the bosonic pure-loss channel (i.e., photon loss) after the optimal recovery operation. We then compare achievable communication rates of the combined encoding-error-recovery channel by calculating the channel's hashing bound for each code. Cat and binomial codes perform similarly, with binomial codes outperforming cat codes at small loss rates. Despite not being designed to protect against the pure-loss channel, GKP codes significantly outperform all other codes for most values of the loss rate. We show that the performance of GKP and some binomial codes increases monotonically with increasing average photon number of the codes. In order to corroborate our numerical evidence of the cat-binomial-GKP order of performance occurring at small loss rates, we analytically evaluate the quantum error-correction conditions of those codes. For GKP codes, we find an essential singularity in the entanglement fidelity in the limit of vanishing loss rate. In addition to comparing the codes, we draw parallels between binomial codes and discrete-variable systems. First, we characterize one- and two-mode binomial as well as multiqubit permutation-invariant codes in terms of spin-coherent states. Such a characterization allows us to introduce check operators and error-correction procedures for binomial codes. Second, we introduce a generalization of spin-coherent states, extending our characterization to qudit binomial codes and yielding a multiqudit code.
Designing stellarator coils by a modified Newton method using FOCUS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Caoxiang; Hudson, Stuart R.; Song, Yuntao
To find the optimal coils for stellarators, nonlinear optimization algorithms are applied in existing coil design codes. However, none of these codes have used the information from the second-order derivatives. In this paper, we present a modified Newton method in the recently developed code FOCUS. The Hessian matrix is calculated with analytically derived equations. Its inverse is approximated by a modified Cholesky factorization and applied in the iterative scheme of a classical Newton method. Using this method, FOCUS is able to recover the W7-X modular coils starting from a simple initial guess. Results demonstrate significant advantages.
Water cycle algorithm: A detailed standard code
NASA Astrophysics Data System (ADS)
Sadollah, Ali; Eskandar, Hadi; Lee, Ho Min; Yoo, Do Guen; Kim, Joong Hoon
Inspired by the observation of the water cycle process and movements of rivers and streams toward the sea, a population-based metaheuristic algorithm, the water cycle algorithm (WCA) has recently been proposed. Lately, an increasing number of WCA applications have appeared and the WCA has been utilized in different optimization fields. This paper provides detailed open source code for the WCA, of which the performance and efficiency has been demonstrated for solving optimization problems. The WCA has an interesting and simple concept and this paper aims to use its source code to provide a step-by-step explanation of the process it follows.
Design of 28 GHz, 200 kW Gyrotron for ECRH Applications
NASA Astrophysics Data System (ADS)
Yadav, Vivek; Singh, Udaybir; Kumar, Nitin; Kumar, Anil; Deorani, S. C.; Sinha, A. K.
2013-01-01
This paper presents the design of 28 GHz, 200 kW gyrotron for Indian TOKAMAK system. The paper reports the designs of interaction cavity, magnetron injection gun and RF window. EGUN code is used for the optimization of electron gun parameters. TE03 mode is selected as the operating mode by using the in-house developed code GCOMS. The simulation and optimization of the cavity parameters are carried out by using the Particle-in-cell, three dimensional (3-D)-electromagnetic simulation code MAGIC. The output power more than 250 kW is achieved.
NASA Technical Reports Server (NTRS)
Clark, R. T.; Mccallister, R. D.
1982-01-01
The particular coding option identified as providing the best level of coding gain performance in an LSI-efficient implementation was the optimal constraint length five, rate one-half convolutional code. To determine the specific set of design parameters which optimally matches this decoder to the LSI constraints, a breadboard MCD (maximum-likelihood convolutional decoder) was fabricated and used to generate detailed performance trade-off data. The extensive performance testing data gathered during this design tradeoff study are summarized, and the functional and physical MCD chip characteristics are presented.
NASA Electronic Library System (NELS) optimization
NASA Technical Reports Server (NTRS)
Pribyl, William L.
1993-01-01
This is a compilation of NELS (NASA Electronic Library System) Optimization progress/problem, interim, and final reports for all phases. The NELS database was examined, particularly in the memory, disk contention, and CPU, to discover bottlenecks. Methods to increase the speed of NELS code were investigated. The tasks included restructuring the existing code to interact with others more effectively. An error reporting code to help detect and remove bugs in the NELS was added. Report writing tools were recommended to integrate with the ASV3 system. The Oracle database management system and tools were to be installed on a Sun workstation, intended for demonstration purposes.
Designing stellarator coils by a modified Newton method using FOCUS
NASA Astrophysics Data System (ADS)
Zhu, Caoxiang; Hudson, Stuart R.; Song, Yuntao; Wan, Yuanxi
2018-06-01
To find the optimal coils for stellarators, nonlinear optimization algorithms are applied in existing coil design codes. However, none of these codes have used the information from the second-order derivatives. In this paper, we present a modified Newton method in the recently developed code FOCUS. The Hessian matrix is calculated with analytically derived equations. Its inverse is approximated by a modified Cholesky factorization and applied in the iterative scheme of a classical Newton method. Using this method, FOCUS is able to recover the W7-X modular coils starting from a simple initial guess. Results demonstrate significant advantages.
Designing stellarator coils by a modified Newton method using FOCUS
Zhu, Caoxiang; Hudson, Stuart R.; Song, Yuntao; ...
2018-03-22
To find the optimal coils for stellarators, nonlinear optimization algorithms are applied in existing coil design codes. However, none of these codes have used the information from the second-order derivatives. In this paper, we present a modified Newton method in the recently developed code FOCUS. The Hessian matrix is calculated with analytically derived equations. Its inverse is approximated by a modified Cholesky factorization and applied in the iterative scheme of a classical Newton method. Using this method, FOCUS is able to recover the W7-X modular coils starting from a simple initial guess. Results demonstrate significant advantages.
ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers
NASA Astrophysics Data System (ADS)
Torrent, Marc
2014-03-01
For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization algorithm, as well as the use of external optimized librairies. Part of this work has been supported by the european Prace project (PaRtnership for Advanced Computing in Europe) in the framework of its workpackage 8.
Hia, Fabian; Chionh, Yok Hian; Pang, Yan Ling Joy; DeMott, Michael S; McBee, Megan E; Dedon, Peter C
2015-03-11
A major challenge in the study of mycobacterial RNA biology is the lack of a comprehensive RNA isolation method that overcomes the unusual cell wall to faithfully yield the full spectrum of non-coding RNA (ncRNA) species. Here, we describe a simple and robust procedure optimized for the isolation of total ncRNA, including 5S, 16S and 23S ribosomal RNA (rRNA) and tRNA, from mycobacteria, using Mycobacterium bovis BCG to illustrate the method. Based on a combination of mechanical disruption and liquid and solid-phase technologies, the method produces all major species of ncRNA in high yield and with high integrity, enabling direct chemical and sequence analysis of the ncRNA species. The reproducibility of the method with BCG was evident in bioanalyzer electrophoretic analysis of isolated RNA, which revealed quantitatively significant differences in the ncRNA profiles of exponentially growing and non-replicating hypoxic bacilli. The method also overcame an historical inconsistency in 5S rRNA isolation, with direct sequencing revealing a novel post-transcriptional processing of 5S rRNA to its functional form and with chemical analysis revealing seven post-transcriptional ribonucleoside modifications in the 5S rRNA. This optimized RNA isolation procedure thus provides a means to more rigorously explore the biology of ncRNA species in mycobacteria. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Optimal quantum error correcting codes from absolutely maximally entangled states
NASA Astrophysics Data System (ADS)
Raissi, Zahra; Gogolin, Christian; Riera, Arnau; Acín, Antonio
2018-02-01
Absolutely maximally entangled (AME) states are pure multi-partite generalizations of the bipartite maximally entangled states with the property that all reduced states of at most half the system size are in the maximally mixed state. AME states are of interest for multipartite teleportation and quantum secret sharing and have recently found new applications in the context of high-energy physics in toy models realizing the AdS/CFT-correspondence. We work out in detail the connection between AME states of minimal support and classical maximum distance separable (MDS) error correcting codes and, in particular, provide explicit closed form expressions for AME states of n parties with local dimension \
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Comparative Evaluation of Different Optimization Algorithms for Structural Design Applications
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.
1996-01-01
Non-linear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Centre, a project was initiated to assess the performance of eight different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using the eight different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems, however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with Sequential Unconstrained Minimizations Technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Performance Trend of Different Algorithms for Structural Design Optimization
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.
1996-01-01
Nonlinear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Center, a project was initiated to assess performance of different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with the sequential unconstrained minimizations technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Journal Bearing Analysis Suite Released for Planetary Gear System Evaluation
NASA Technical Reports Server (NTRS)
Brewe, David E.; Clark, David A.
2005-01-01
Planetary gear systems are an efficient means of achieving high reduction ratios with minimum space and weight. They are used in helicopter, aerospace, automobile, and many industrial applications. High-speed planetary gear systems will have significant dynamic loading and high heat generation. Hence, they need jet lubrication and associated cooling systems. For units operating in critical applications that necessitate high reliability and long life, that have very large torque loading, and that have downtime costs that are significantly greater than the initial cost, hydrodynamic journal bearings are a must. Computational and analytical tools are needed for sufficiently accurate modeling to facilitate optimal design of these systems. Sufficient physics is needed in the model to facilitate parametric studies of design conditions that enable optimal designs. The first transient journal bearing code to implement the Jacobsson-Floberg-Olsson boundary conditions, using a mass-conserving algorithm devised by Professor Emeritus Harold Elrod of Columbia University, was written by David E. Brewe of the U.S. Army at the NASA Lewis Research Center1 in 1983. Since then, new features and improved modifications have been built into the code by several contributors supported through Army and NASA funding via cooperative agreements with the University of Toledo (Professor Ted Keith, Jr., and Dr. Desikakary Vijayaraghavan) and National Research Council Programs (Dr. Vijayaraghavan). All this was conducted with the close consultation of Professor Elrod and the project management of David Brewe.
Formulation for Simultaneous Aerodynamic Analysis and Design Optimization
NASA Technical Reports Server (NTRS)
Hou, G. W.; Taylor, A. C., III; Mani, S. V.; Newman, P. A.
1993-01-01
An efficient approach for simultaneous aerodynamic analysis and design optimization is presented. This approach does not require the performance of many flow analyses at each design optimization step, which can be an expensive procedure. Thus, this approach brings us one step closer to meeting the challenge of incorporating computational fluid dynamic codes into gradient-based optimization techniques for aerodynamic design. An adjoint-variable method is introduced to nullify the effect of the increased number of design variables in the problem formulation. The method has been successfully tested on one-dimensional nozzle flow problems, including a sample problem with a normal shock. Implementations of the above algorithm are also presented that incorporate Newton iterations to secure a high-quality flow solution at the end of the design process. Implementations with iterative flow solvers are possible and will be required for large, multidimensional flow problems.
NASA Astrophysics Data System (ADS)
Gambino, James; Tarver, Craig; Springer, H. Keo; White, Bradley; Fried, Laurence
2017-06-01
We present a novel method for optimizing parameters of the Ignition and Growth reactive flow (I&G) model for high explosives. The I&G model can yield accurate predictions of experimental observations. However, calibrating the model is a time-consuming task especially with multiple experiments. In this study, we couple the differential evolution global optimization algorithm to simulations of shock initiation experiments in the multi-physics code ALE3D. We develop parameter sets for HMX based explosives LX-07 and LX-10. The optimization finds the I&G model parameters that globally minimize the difference between calculated and experimental shock time of arrival at embedded pressure gauges. This work was performed under the auspices of the U.S. DOE by LLNL under contract DE-AC52-07NA27344. LLNS, LLC LLNL-ABS- 724898.
NASA Technical Reports Server (NTRS)
2004-01-01
Ever wonder whether a still shot from a home video could serve as a "picture perfect" photograph worthy of being framed and proudly displayed on the mantle? Wonder no more. A critical imaging code used to enhance video footage taken from spaceborne imaging instruments is now available within a portable photography tool capable of producing an optimized, high-resolution image from multiple video frames.
Soft-Decision Decoding of Binary Linear Block Codes Based on an Iterative Search Algorithm
NASA Technical Reports Server (NTRS)
Lin, Shu; Kasami, Tadao; Moorthy, H. T.
1997-01-01
This correspondence presents a suboptimum soft-decision decoding scheme for binary linear block codes based on an iterative search algorithm. The scheme uses an algebraic decoder to iteratively generate a sequence of candidate codewords one at a time using a set of test error patterns that are constructed based on the reliability information of the received symbols. When a candidate codeword is generated, it is tested based on an optimality condition. If it satisfies the optimality condition, then it is the most likely (ML) codeword and the decoding stops. If it fails the optimality test, a search for the ML codeword is conducted in a region which contains the ML codeword. The search region is determined by the current candidate codeword and the reliability of the received symbols. The search is conducted through a purged trellis diagram for the given code using the Viterbi algorithm. If the search fails to find the ML codeword, a new candidate is generated using a new test error pattern, and the optimality test and search are renewed. The process of testing and search continues until either the MEL codeword is found or all the test error patterns are exhausted and the decoding process is terminated. Numerical results show that the proposed decoding scheme achieves either practically optimal performance or a performance only a fraction of a decibel away from the optimal maximum-likelihood decoding with a significant reduction in decoding complexity compared with the Viterbi decoding based on the full trellis diagram of the codes.
NASA Astrophysics Data System (ADS)
Tian, Lei; Waller, Laura
2017-05-01
Microscope lenses can have either large field of view (FOV) or high resolution, not both. Computational microscopy based on illumination coding circumvents this limit by fusing images from different illumination angles using nonlinear optimization algorithms. The result is a Gigapixel-scale image having both wide FOV and high resolution. We demonstrate an experimentally robust reconstruction algorithm based on a 2nd order quasi-Newton's method, combined with a novel phase initialization scheme. To further extend the Gigapixel imaging capability to 3D, we develop a reconstruction method to process the 4D light field measurements from sequential illumination scanning. The algorithm is based on a 'multislice' forward model that incorporates both 3D phase and diffraction effects, as well as multiple forward scatterings. To solve the inverse problem, an iterative update procedure that combines both phase retrieval and 'error back-propagation' is developed. To avoid local minimum solutions, we further develop a novel physical model-based initialization technique that accounts for both the geometric-optic and 1st order phase effects. The result is robust reconstructions of Gigapixel 3D phase images having both wide FOV and super resolution in all three dimensions. Experimental results from an LED array microscope were demonstrated.
Numerical optimization of a picosecond pulse driven Ni-like Nb x-ray laser at 20.3 nm
NASA Astrophysics Data System (ADS)
Lu, X.; Zhong, J. Y.; Li, Y. J.; Zhang, J.
2003-07-01
Detailed simulations of a Ni-like Nb x-ray laser pumped by a nanosecond prepulse followed by a picosecond main pulse are presented. The atomic physics data are obtained using the Cowan code [R. D. Cowan, The Theory of Atomic Structure and Spectra (University of California Press, Berkeley, CA, 1981)]. The optimization calculations are performed in terms of the intensity of prepulse and the time delay between the prepulse and the main pulse. A high gain over 150 cm-1 is obtained for the optimized drive pulse configuration. The ray-tracing calculations suggest that the total pump energy for a saturated x-ray laser can be reduced to less than 1 J.
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.
Newberg, Lee A
2008-08-15
A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
Schnabel, M; Mann, D; Efe, T; Schrappe, M; V Garrel, T; Gotzen, L; Schaeg, M
2004-10-01
The introduction of the German Diagnostic Related Groups (D-DRG) system requires redesigning administrative patient management strategies. Wrong coding leads to inaccurate grouping and endangers the reimbursement of treatment costs. This situation emphasizes the roles of documentation and coding as factors of economical success. The aims of this study were to assess the quantity and quality of initial documentation and coding (ICD-10 and OPS-301) and find operative strategies to improve efficiency and strategic means to ensure optimal documentation and coding quality. In a prospective study, documentation and coding quality were evaluated in a standardized way by weekly assessment. Clinical data from 1385 inpatients were processed for initial correctness and quality of documentation and coding. Principal diagnoses were found to be accurate in 82.7% of cases, inexact in 7.1%, and wrong in 10.1%. Effects on financial returns occurred in 16%. Based on these findings, an optimized, interdisciplinary, and multiprofessional workflow on medical documentation, coding, and data control was developed. Workflow incorporating regular assessment of documentation and coding quality is required by the DRG system to ensure efficient accounting of hospital services. Interdisciplinary and multiprofessional cooperation is recognized to be an important factor in establishing an efficient workflow in medical documentation and coding.
Magnet system optimization for segmented adaptive-gap in-vacuum undulator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kitegi, C., E-mail: ckitegi@bnl.gov; Chubar, O.; Eng, C.
2016-07-27
Segmented Adaptive Gap in-vacuum Undulator (SAGU), in which different segments have different gaps and periods, promises a considerable spectral performance gain over a conventional undulator with uniform gap and period. According to calculations, this gain can be comparable to the gain achievable with a superior undulator technology (e.g. a room-temperature in-vacuum hybrid SAGU would perform as a cryo-cooled hybrid in-vacuum undulator with uniform gap and period). However, for reaching the high spectral performance, SAGU magnetic design has to include compensation of kicks experienced by the electron beam at segment junctions because of different deflection parameter values in the segments. Wemore » show that such compensation to large extent can be accomplished by using a passive correction, however, simple correction coils are nevertheless required as well to reach perfect compensation over a whole SAGU tuning range. Magnetic optimizations performed with Radia code, and the resulting undulator radiation spectra calculated using SRW code, demonstrating a possibility of nearly perfect correction, are presented.« less
Advanced complex trait analysis.
Gray, A; Stewart, I; Tenesa, A
2012-12-01
The Genome-wide Complex Trait Analysis (GCTA) software package can quantify the contribution of genetic variation to phenotypic variation for complex traits. However, as those datasets of interest continue to increase in size, GCTA becomes increasingly computationally prohibitive. We present an adapted version, Advanced Complex Trait Analysis (ACTA), demonstrating dramatically improved performance. We restructure the genetic relationship matrix (GRM) estimation phase of the code and introduce the highly optimized parallel Basic Linear Algebra Subprograms (BLAS) library combined with manual parallelization and optimization. We introduce the Linear Algebra PACKage (LAPACK) library into the restricted maximum likelihood (REML) analysis stage. For a test case with 8999 individuals and 279,435 single nucleotide polymorphisms (SNPs), we reduce the total runtime, using a compute node with two multi-core Intel Nehalem CPUs, from ∼17 h to ∼11 min. The source code is fully available under the GNU Public License, along with Linux binaries. For more information see http://www.epcc.ed.ac.uk/software-products/acta. a.gray@ed.ac.uk Supplementary data are available at Bioinformatics online.
The NASA Neutron Star Grand Challenge: The coalescences of Neutron Star Binary System
NASA Astrophysics Data System (ADS)
Suen, Wai-Mo
1998-04-01
NASA funded a Grand Challenge Project (9/1996-1999) for the development of a multi-purpose numerical treatment for relativistic astrophysics and gravitational wave astronomy. The coalescence of binary neutron stars is chosen as the model problem for the code development. The institutes involved in it are the Argonne Lab, Livermore lab, Max-Planck Institute at Potsdam, StonyBrook, U of Illinois and Washington U. We have recently succeeded in constructing a highly optimized parallel code which is capable of solving the full Einstein equations coupled with relativistic hydrodynamics, running at over 50 GFLOPS on a T3E (the second milestone point of the project). We are presently working on the head-on collisions of two neutron stars, and the inclusion of realistic equations of state into the code. The code will be released to the relativity and astrophysics community in April of 1998. With the full dynamics of the spacetime, relativistic hydro and microphysics all combined into a unified 3D code for the first time, many interesting large scale calculations in general relativistic astrophysics can now be carried out on massively parallel computers.
Pressure profiles of the BRing based on the simulation used in the CSRm
NASA Astrophysics Data System (ADS)
Wang, J. C.; Li, P.; Yang, J. C.; Yuan, Y. J.; Wu, B.; Chai, Z.; Luo, C.; Dong, Z. Q.; Zheng, W. H.; Zhao, H.; Ruan, S.; Wang, G.; Liu, J.; Chen, X.; Wang, K. D.; Qin, Z. M.; Yin, B.
2017-07-01
HIAF-BRing, a new multipurpose accelerator facility of the High Intensity heavy-ion Accelerator Facility project, requires an extremely high vacuum lower than 10-11 mbar to fulfill the requirements of radioactive beam physics and high energy density physics. To achieve the required process pressure, the bench-marked codes of VAKTRAK and Molflow+ are used to simulate the pressure profiles of the BRing system. In order to ensure the accuracy of the implementation of VAKTRAK, the computational results are verified by measured pressure data and compared with a new simulation code BOLIDE on the current synchrotron CSRm. Since the verification of VAKTRAK has been done, the pressure profiles of the BRing are calculated with different parameters such as conductance, out-gassing rates and pumping speeds. According to the computational results, the optimal parameters are selected to achieve the required pressure for the BRing.
A high-throughput exploration of magnetic materials by using structure predicting methods
NASA Astrophysics Data System (ADS)
Arapan, S.; Nieves, P.; Cuesta-López, S.
2018-02-01
We study the capability of a structure predicting method based on genetic/evolutionary algorithm for a high-throughput exploration of magnetic materials. We use the USPEX and VASP codes to predict stable and generate low-energy meta-stable structures for a set of representative magnetic structures comprising intermetallic alloys, oxides, interstitial compounds, and systems containing rare-earths elements, and for both types of ferromagnetic and antiferromagnetic ordering. We have modified the interface between USPEX and VASP codes to improve the performance of structural optimization as well as to perform calculations in a high-throughput manner. We show that exploring the structure phase space with a structure predicting technique reveals large sets of low-energy metastable structures, which not only improve currently exiting databases, but also may provide understanding and solutions to stabilize and synthesize magnetic materials suitable for permanent magnet applications.
Zhang, Chenchu; Hu, Yanlei; Du, Wenqiang; Wu, Peichao; Rao, Shenglong; Cai, Ze; Lao, Zhaoxin; Xu, Bing; Ni, Jincheng; Li, Jiawen; Zhao, Gang; Wu, Dong; Chu, Jiaru; Sugioka, Koji
2016-01-01
Rapid integration of high-quality functional devices in microchannels is in highly demand for miniature lab-on-a-chip applications. This paper demonstrates the embellishment of existing microfluidic devices with integrated micropatterns via femtosecond laser MRAF-based holographic patterning (MHP) microfabrication, which proves two-photon polymerization (TPP) based on spatial light modulator (SLM) to be a rapid and powerful technology for chip functionalization. Optimized mixed region amplitude freedom (MRAF) algorithm has been used to generate high-quality shaped focus field. Base on the optimized parameters, a single-exposure approach is developed to fabricate 200 × 200 μm microstructure arrays in less than 240 ms. Moreover, microtraps, QR code and letters are integrated into a microdevice by the advanced method for particles capture and device identification. These results indicate that such a holographic laser embellishment of microfluidic devices is simple, flexible and easy to access, which has great potential in lab-on-a-chip applications of biological culture, chemical analyses and optofluidic devices. PMID:27619690
Coordinated design of coding and modulation systems
NASA Technical Reports Server (NTRS)
Massey, J. L.; Ancheta, T.; Johannesson, R.; Lauer, G.; Lee, L.
1976-01-01
The joint optimization of the coding and modulation systems employed in telemetry systems was investigated. Emphasis was placed on formulating inner and outer coding standards used by the Goddard Spaceflight Center. Convolutional codes were found that are nearly optimum for use with Viterbi decoding in the inner coding of concatenated coding systems. A convolutional code, the unit-memory code, was discovered and is ideal for inner system usage because of its byte-oriented structure. Simulations of sequential decoding on the deep-space channel were carried out to compare directly various convolutional codes that are proposed for use in deep-space systems.
A 3DHZETRN Code in a Spherical Uniform Sphere with Monte Carlo Verification
NASA Technical Reports Server (NTRS)
Wilson, John W.; Slaba, Tony C.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.
2014-01-01
The computationally efficient HZETRN code has been used in recent trade studies for lunar and Martian exploration and is currently being used in the engineering development of the next generation of space vehicles, habitats, and extra vehicular activity equipment. A new version (3DHZETRN) capable of transporting High charge (Z) and Energy (HZE) and light ions (including neutrons) under space-like boundary conditions with enhanced neutron and light ion propagation is under development. In the present report, new algorithms for light ion and neutron propagation with well-defined convergence criteria in 3D objects is developed and tested against Monte Carlo simulations to verify the solution methodology. The code will be available through the software system, OLTARIS, for shield design and validation and provides a basis for personal computer software capable of space shield analysis and optimization.
Performance of an Optimized Eta Model Code on the Cray T3E and a Network of PCs
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Rancic, Miodrag; Geiger, Jim
2000-01-01
In the year 2001, NASA will launch the satellite TRIANA that will be the first Earth observing mission to provide a continuous, full disk view of the sunlit Earth. As a part of the HPCC Program at NASA GSFC, we have started a project whose objectives are to develop and implement a 3D cloud data assimilation system, by combining TRIANA measurements with model simulation, and to produce accurate statistics of global cloud coverage as an important element of the Earth's climate. For simulation of the atmosphere within this project we are using the NCEP/NOAA operational Eta model. In order to compare TRIANA and the Eta model data on approximately the same grid without significant downscaling, the Eta model will be integrated at a resolution of about 15 km. The integration domain (from -70 to +70 deg in latitude and 150 deg in longitude) will cover most of the sunlit Earth disc and will continuously rotate around the globe following TRIANA. The cloud data assimilation is supposed to run and produce 3D clouds on a near real-time basis. Such a numerical setup and integration design is very ambitious and computationally demanding. Thus, though the Eta model code has been very carefully developed and its computational efficiency has been systematically polished during the years of operational implementation at NCEP, the current MPI version may still have problems with memory and efficiency for the TRIANA simulations. Within this work, we optimize a parallel version of the Eta model code on a Cray T3E and a network of PCs (theHIVE) in order to improve its overall efficiency. Our optimization procedure consists of introducing dynamically allocated arrays to reduce the size of static memory, and optimizing on a single processor by splitting loops to limit the number of streams. All the presented results are derived using an integration domain centered at the equator, with a size of 60 x 60 deg, and with horizontal resolutions of 1/2 and 1/3 deg, respectively. In accompanying charts we report the elapsed time, the speedup and the Mflops as a function of the number of processors for the non-optimized version of the code on the T3E and theHIVE. The large amount of communication required for model integration explains its poor performance on theHIVE. Our initial implementation of the dynamic memory allocation has contributed to about 12% reduction of memory but has introduced a 3% overhead in computing time. This overhead was removed by performing loop splitting in some of the high demanding subroutines. When the Eta code is fully optimized in order to meet the memory requirement for TRIANA simulations, a non-negligeable overhead may appear that may seriously affect the efficiency of the code. To alleviate this problem, we are considering implementation of a new algorithm for the horizontal advection that is computationally less expensive, and also a new approach for marching in time.
Numerical optimization of three-dimensional coils for NSTX-U
NASA Astrophysics Data System (ADS)
Lazerson, S. A.; Park, J.-K.; Logan, N.; Boozer, A.
2015-10-01
A tool for the calculation of optimal three-dimensional (3D) perturbative magnetic fields in tokamaks has been developed. The IPECOPT code builds upon the stellarator optimization code STELLOPT to allow for optimization of linear ideal magnetohydrodynamic perturbed equilibrium (IPEC). This tool has been applied to NSTX-U equilibria, addressing which fields are the most effective at driving NTV torques. The NTV torque calculation is performed by the PENT code. Optimization of the normal field spectrum shows that fields with n = 1 character can drive a large core torque. It is also shown that fields with n = 3 features are capable of driving edge torque and some core torque. Coil current optimization (using the planned in-vessel and existing RWM coils) on NSTX-U suggest the planned coils set is adequate for core and edge torque control. Comparison between error field correction experiments on DIII-D and the optimizer show good agreement. Notice: This manuscript has been authored by Princeton University under Contract Number DE-AC02-09CH11466 with the U.S. Department of Energy. The publisher, by accepting the article for publication acknowledges, that the United States Government retains a non-exclusive,paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Implementation of Soft X-ray Tomography on NSTX
NASA Astrophysics Data System (ADS)
Tritz, K.; Stutman, D.; Finkenthal, M.; Granetz, R.; Menard, J.; Park, W.
2003-10-01
A set of poloidal ultrasoft X-ray arrays is operated by the Johns Hopkins group on NSTX. To enable MHD mode analysis independent of the magnetic reconstruction, the McCormick-Granetz tomography code developed at MIT is being adapted to the NSTX geometry. Tests of the code using synthetic data show that that present X-ray system is adequate for m=1 tomography. In addition, we have found that spline basis functions may be better suited than Bessel functions for the reconstruction of radially localized phenomena in NSTX. The tomography code was also used to determine the necessary array expansion and optimal array placement for the characterization of higher m modes (m=2,3) in the future. Initial reconstruction of experimental soft X-ray data has been performed for m=1 internal modes, which are often encountered in high beta NSTX discharges. The reconstruction of these modes will be compared to predictions from the M3D code and magnetic measurements.
AutoBayes Program Synthesis System Users Manual
NASA Technical Reports Server (NTRS)
Schumann, Johann; Jafari, Hamed; Pressburger, Tom; Denney, Ewen; Buntine, Wray; Fischer, Bernd
2008-01-01
Program synthesis is the systematic, automatic construction of efficient executable code from high-level declarative specifications. AutoBayes is a fully automatic program synthesis system for the statistical data analysis domain; in particular, it solves parameter estimation problems. It has seen many successful applications at NASA and is currently being used, for example, to analyze simulation results for Orion. The input to AutoBayes is a concise description of a data analysis problem composed of a parameterized statistical model and a goal that is a probability term involving parameters and input data. The output is optimized and fully documented C/C++ code computing the values for those parameters that maximize the probability term. AutoBayes can solve many subproblems symbolically rather than having to rely on numeric approximation algorithms, thus yielding effective, efficient, and compact code. Statistical analysis is faster and more reliable, because effort can be focused on model development and validation rather than manual development of solution algorithms and code.
Applications of Coding in Network Communications
ERIC Educational Resources Information Center
Chang, Christopher SungWook
2012-01-01
This thesis uses the tool of network coding to investigate fast peer-to-peer file distribution, anonymous communication, robust network construction under uncertainty, and prioritized transmission. In a peer-to-peer file distribution system, we use a linear optimization approach to show that the network coding framework significantly simplifies…
Computational modeling of Krypton gas puffs with tailored mass density profiles on Z
Jennings, Christopher A.; Ampleford, David J.; Lamppa, Derek C.; ...
2015-05-18
Large diameter multi-shell gas puffs rapidly imploded by high current (~20 MA, ~100 ns) on the Z generator of Sandia National Laboratories are able to produce high-intensity Krypton K-shell emission at ~13 keV. Efficiently radiating at these high photon energies is a significant challenge which requires the careful design and optimization of the gas distribution. To facilitate this, we hydrodynamically model the gas flow out of the nozzle and then model its implosion using a 3-dimensional resistive, radiative MHD code (GORGON). This approach enables us to iterate between modeling the implosion and gas flow from the nozzle to optimize radiativemore » output from this combined system. Furthermore, guided by our implosion calculations, we have designed gas profiles that help mitigate disruption from Magneto-Rayleigh–Taylor implosion instabilities, while preserving sufficient kinetic energy to thermalize to the high temperatures required for K-shell emission.« less
Schneider, Adam D.; Jamali, Mohsen; Carriot, Jerome; Chacron, Maurice J.
2015-01-01
Efficient processing of incoming sensory input is essential for an organism's survival. A growing body of evidence suggests that sensory systems have developed coding strategies that are constrained by the statistics of the natural environment. Consequently, it is necessary to first characterize neural responses to natural stimuli to uncover the coding strategies used by a given sensory system. Here we report for the first time the statistics of vestibular rotational and translational stimuli experienced by rhesus monkeys during natural (e.g., walking, grooming) behaviors. We find that these stimuli can reach intensities as high as 1500 deg/s and 8 G. Recordings from afferents during naturalistic rotational and linear motion further revealed strongly nonlinear responses in the form of rectification and saturation, which could not be accurately predicted by traditional linear models of vestibular processing. Accordingly, we used linear–nonlinear cascade models and found that these could accurately predict responses to naturalistic stimuli. Finally, we tested whether the statistics of natural vestibular signals constrain the neural coding strategies used by peripheral afferents. We found that both irregular otolith and semicircular canal afferents, because of their higher sensitivities, were more optimized for processing natural vestibular stimuli as compared with their regular counterparts. Our results therefore provide the first evidence supporting the hypothesis that the neural coding strategies used by the vestibular system are matched to the statistics of natural stimuli. PMID:25855169
Integration of Rotor Aerodynamic Optimization with the Conceptual Design of a Large Civil Tiltrotor
NASA Technical Reports Server (NTRS)
Acree, C. W., Jr.
2010-01-01
Coupling of aeromechanics analysis with vehicle sizing is demonstrated with the CAMRAD II aeromechanics code and NDARC sizing code. The example is optimization of cruise tip speed with rotor/wing interference for the Large Civil Tiltrotor (LCTR2) concept design. Free-wake models were used for both rotors and the wing. This report is part of a NASA effort to develop an integrated analytical capability combining rotorcraft aeromechanics, structures, propulsion, mission analysis, and vehicle sizing. The present paper extends previous efforts by including rotor/wing interference explicitly in the rotor performance optimization and implicitly in the sizing.
Bit selection using field drilling data and mathematical investigation
NASA Astrophysics Data System (ADS)
Momeni, M. S.; Ridha, S.; Hosseini, S. J.; Meyghani, B.; Emamian, S. S.
2018-03-01
A drilling process will not be complete without the usage of a drill bit. Therefore, bit selection is considered to be an important task in drilling optimization process. To select a bit is considered as an important issue in planning and designing a well. This is simply because the cost of drilling bit in total cost is quite high. Thus, to perform this task, aback propagation ANN Model is developed. This is done by training the model using several wells and it is done by the usage of drilling bit records from offset wells. In this project, two models are developed by the usage of the ANN. One is to find predicted IADC bit code and one is to find Predicted ROP. Stage 1 was to find the IADC bit code by using all the given filed data. The output is the Targeted IADC bit code. Stage 2 was to find the Predicted ROP values using the gained IADC bit code in Stage 1. Next is Stage 3 where the Predicted ROP value is used back again in the data set to gain Predicted IADC bit code value. The output is the Predicted IADC bit code. Thus, at the end, there are two models that give the Predicted ROP values and Predicted IADC bit code values.
Dai, Shengfa; Wei, Qingguo
2017-01-01
Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.
The Limits of Coding with Joint Constraints on Detected and Undetected Error Rates
NASA Technical Reports Server (NTRS)
Dolinar, Sam; Andrews, Kenneth; Pollara, Fabrizio; Divsalar, Dariush
2008-01-01
We develop a remarkably tight upper bound on the performance of a parameterized family of bounded angle maximum-likelihood (BA-ML) incomplete decoders. The new bound for this class of incomplete decoders is calculated from the code's weight enumerator, and is an extension of Poltyrev-type bounds developed for complete ML decoders. This bound can also be applied to bound the average performance of random code ensembles in terms of an ensemble average weight enumerator. We also formulate conditions defining a parameterized family of optimal incomplete decoders, defined to minimize both the total codeword error probability and the undetected error probability for any fixed capability of the decoder to detect errors. We illustrate the gap between optimal and BA-ML incomplete decoding via simulation of a small code.
NASA Technical Reports Server (NTRS)
Cheng, Michael K.; Lyubarev, Mark; Nakashima, Michael A.; Andrews, Kenneth S.; Lee, Dennis
2008-01-01
Low-density parity-check (LDPC) codes are the state-of-the-art in forward error correction (FEC) technology that exhibits capacity approaching performance. The Jet Propulsion Laboratory (JPL) has designed a family of LDPC codes that are similar in structure and therefore, leads to a single decoder implementation. The Accumulate-Repeat-by-4-Jagged- Accumulate (AR4JA) code design offers a family of codes with rates 1/2, 2/3, 4/5 and lengths 1024, 4096, 16384 information bits. Performance is less than one dB from capacity for all combinations.Integrating a stand-alone LDPC decoder with a commercial-off-the-shelf (COTS) receiver faces additional challenges than building a single receiver-decoder unit from scratch. In this work, we outline the issues and show that these additional challenges can be over-come by simple solutions. To demonstrate that an LDPC decoder can be made to work seamlessly with a COTS receiver, we interface an AR4JA LDPC decoder developed on a field-programmable gate array (FPGA) with a modern high data rate receiver and mea- sure the combined receiver-decoder performance. Through optimizations that include an improved frame synchronizer and different soft-symbol scaling algorithms, we show that a combined implementation loss of less than one dB is possible and therefore, most of the coding gain evidence in theory can also be obtained in practice. Our techniques can benefit any modem that utilizes an advanced FEC code.
Scheduling Operations for Massive Heterogeneous Clusters
NASA Technical Reports Server (NTRS)
Humphrey, John; Spagnoli, Kyle
2013-01-01
High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
Optimizing Dense Plasma Focus Neutron Yields With Fast Gas Jets
NASA Astrophysics Data System (ADS)
McMahon, Matthew; Stein, Elizabeth; Higginson, Drew; Kueny, Christopher; Link, Anthony; Schmidt, Andrea
2017-10-01
We report a study using the particle-in-cell code LSP to perform fully kinetic simulations modeling dense plasma focus (DPF) devices with high density gas jets on axis. The high-density jets are modeled in the large-eddy Navier-Stokes code CharlesX, which is suitable for modeling both sub-sonic and supersonic gas flow. The gas pattern, which is essentially static on z-pinch time scales, is imported from CharlesX to LSP for neutron yield predictions. Fast gas puffs allow for more mass on axis while maintaining the optimal pressure for the DPF. As the density of a subsonic jet increases relative to the background fill, we find the neutron yield increases, as does the variability in the neutron yield. Introducing perturbations in the jet density via super-sonic flow (also known as Mach diamonds) allow for consistent seeding of the m =0 instability leading to more consistent ion acceleration and higher neutron yields with less variability. Jets with higher on axis density are found to have the greatest yield. The optimal jet configuration and the necessary jet conditions for increasing neutron yield and reducing yield variability are explored. Simulations of realistic jet profiles are performed and compared to the ideal scenario. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and supported by the Laboratory Directed Research and Development Program (15-ERD-034) at LLNL.
Advances in Engineering Software for Lift Transportation Systems
NASA Astrophysics Data System (ADS)
Kazakoff, Alexander Borisoff
2012-03-01
In this paper an attempt is performed at computer modelling of ropeway ski lift systems. The logic in these systems is based on a travel form between the two terminals, which operates with high capacity cabins, chairs, gondolas or draw-bars. Computer codes AUTOCAD, MATLAB and Compaq-Visual Fortran - version 6.6 are used in the computer modelling. The rope systems computer modelling is organized in two stages in this paper. The first stage is organization of the ground relief profile and a design of the lift system as a whole, according to the terrain profile and the climatic and atmospheric conditions. The ground profile is prepared by the geodesists and is presented in an AUTOCAD view. The next step is the design of the lift itself which is performed by programmes using the computer code MATLAB. The second stage of the computer modelling is performed after the optimization of the co-ordinates and the lift profile using the computer code MATLAB. Then the co-ordinates and the parameters are inserted into a program written in Compaq Visual Fortran - version 6.6., which calculates 171 lift parameters, organized in 42 tables. The objective of the work presented in this paper is an attempt at computer modelling of the design and parameters derivation of the rope way systems and their computer variation and optimization.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit
NASA Technical Reports Server (NTRS)
Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;
1998-01-01
Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.
Sparse bursts optimize information transmission in a multiplexed neural code.
Naud, Richard; Sprekeler, Henning
2018-06-22
Many cortical neurons combine the information ascending and descending the cortical hierarchy. In the classical view, this information is combined nonlinearly to give rise to a single firing-rate output, which collapses all input streams into one. We analyze the extent to which neurons can simultaneously represent multiple input streams by using a code that distinguishes spike timing patterns at the level of a neural ensemble. Using computational simulations constrained by experimental data, we show that cortical neurons are well suited to generate such multiplexing. Interestingly, this neural code maximizes information for short and sparse bursts, a regime consistent with in vivo recordings. Neurons can also demultiplex this information, using specific connectivity patterns. The anatomy of the adult mammalian cortex suggests that these connectivity patterns are used by the nervous system to maintain sparse bursting and optimal multiplexing. Contrary to firing-rate coding, our findings indicate that the physiology and anatomy of the cortex may be interpreted as optimizing the transmission of multiple independent signals to different targets. Copyright © 2018 the Author(s). Published by PNAS.
NASA Technical Reports Server (NTRS)
Berke, Laszlo; Patnaik, Surya N.; Murthy, Pappu L. N.
1993-01-01
The application of artificial neural networks to capture structural design expertise is demonstrated. The principal advantage of a trained neural network is that it requires trivial computational effort to produce an acceptable new design. For the class of problems addressed, the development of a conventional expert system would be extremely difficult. In the present effort, a structural optimization code with multiple nonlinear programming algorithms and an artificial neural network code NETS were used. A set of optimum designs for a ring and two aircraft wings for static and dynamic constraints were generated by using the optimization codes. The optimum design data were processed to obtain input and output pairs, which were used to develop a trained artificial neural network with the code NETS. Optimum designs for new design conditions were predicted by using the trained network. Neural net prediction of optimum designs was found to be satisfactory for most of the output design parameters. However, results from the present study indicate that caution must be exercised to ensure that all design variables are within selected error bounds.
NASA Astrophysics Data System (ADS)
Darazi, R.; Gouze, A.; Macq, B.
2009-01-01
Reproducing a natural and real scene as we see in the real world everyday is becoming more and more popular. Stereoscopic and multi-view techniques are used for this end. However due to the fact that more information are displayed requires supporting technologies such as digital compression to ensure the storage and transmission of the sequences. In this paper, a new scheme for stereo image coding is proposed. The original left and right images are jointly coded. The main idea is to optimally exploit the existing correlation between the two images. This is done by the design of an efficient transform that reduces the existing redundancy in the stereo image pair. This approach was inspired by Lifting Scheme (LS). The novelty in our work is that the prediction step is been replaced by an hybrid step that consists in disparity compensation followed by luminance correction and an optimized prediction step. The proposed scheme can be used for lossless and for lossy coding. Experimental results show improvement in terms of performance and complexity compared to recently proposed methods.
Compression performance of HEVC and its format range and screen content coding extensions
NASA Astrophysics Data System (ADS)
Li, Bin; Xu, Jizheng; Sullivan, Gary J.
2015-09-01
This paper presents a comparison-based test of the objective compression performance of the High Efficiency Video Coding (HEVC) standard, its format range extensions (RExt), and its draft screen content coding extensions (SCC). The current dominant standard, H.264/MPEG-4 AVC, is used as an anchor reference in the comparison. The conditions used for the comparison tests were designed to reflect relevant application scenarios and to enable a fair comparison to the maximum extent feasible - i.e., using comparable quantization settings, reference frame buffering, intra refresh periods, rate-distortion optimization decision processing, etc. It is noted that such PSNR-based objective comparisons generally provide more conservative estimates of HEVC benefit than are found in subjective studies. The experimental results show that, when compared with H.264/MPEG-4 AVC, HEVC version 1 provides a bit rate savings for equal PSNR of about 23% for all-intra coding, 34% for random access coding, and 38% for low-delay coding. This is consistent with prior studies and the general characterization that HEVC can provide about a bit rate savings of about 50% for equal subjective quality for most applications. The HEVC format range extensions provide a similar bit rate savings of about 13-25% for all-intra coding, 28-33% for random access coding, and 32-38% for low-delay coding at different bit rate ranges. For lossy coding of screen content, the HEVC screen content coding extensions achieve a bit rate savings of about 66%, 63%, and 61% for all-intra coding, random access coding, and low-delay coding, respectively. For lossless coding, the corresponding bit rate savings are about 40%, 33%, and 32%, respectively.
NASA Astrophysics Data System (ADS)
Frantz, Cathy; Fritsch, Andreas; Uhlig, Ralf
2017-06-01
In solar tower power plants the receiver is one of the critical components. It converts the solar radiation into heat and must withstand high heat flux densities and high daily or even hourly gradients (due to passage of clouds). For this reason, the challenge during receiver design is to find a reasonable compromise between receiver efficiency, reliability, lifetime and cost. There is a strong interaction between the heliostat field, the receiver and the heat transfer fluid. Therefore, a proper receiver design needs to consider these components within the receiver optimization. There are several design and optimization tools for receivers, but most of them focus only on the receiver, ignoring the heliostat field and other parts of the plant. During the last years DLR developed the ASTRIDcode for tubular receiver concept simulation. The code comprises both a high and a low-detail model. The low-detail model utilizes a number of simplifications which allow the user to screen a high number of receiver concepts for optimization purposes. The high-detail model uses a FE model and is able to compute local absorber and salt temperatures with high accuracy. One key strength of the ASTRIDcode is its interface to a ray tracing software which simulates a realistic heat flux distributions on the receiver surface. The results generated by the ASTRIDcode have been validated by CFD simulations and measurement data.
Kim, Dong-Sun; Kwon, Jin-San
2014-01-01
Research on real-time health systems have received great attention during recent years and the needs of high-quality personal multichannel medical signal compression for personal medical product applications are increasing. The international MPEG-4 audio lossless coding (ALS) standard supports a joint channel-coding scheme for improving compression performance of multichannel signals and it is very efficient compression method for multi-channel biosignals. However, the computational complexity of such a multichannel coding scheme is significantly greater than that of other lossless audio encoders. In this paper, we present a multichannel hardware encoder based on a low-complexity joint-coding technique and shared multiplier scheme for portable devices. A joint-coding decision method and a reference channel selection scheme are modified for a low-complexity joint coder. The proposed joint coding decision method determines the optimized joint-coding operation based on the relationship between the cross correlation of residual signals and the compression ratio. The reference channel selection is designed to select a channel for the entropy coding of the joint coding. The hardware encoder operates at a 40 MHz clock frequency and supports two-channel parallel encoding for the multichannel monitoring system. Experimental results show that the compression ratio increases by 0.06%, whereas the computational complexity decreases by 20.72% compared to the MPEG-4 ALS reference software encoder. In addition, the compression ratio increases by about 11.92%, compared to the single channel based bio-signal lossless data compressor. PMID:25237900
Coding visual features extracted from video sequences.
Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano
2014-05-01
Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Design and characterization of very high frequency pulse tube prototypes
NASA Astrophysics Data System (ADS)
Lopes, Diogo; Duval, Jean-Marc; Charles, Ivan; Butterworth, James; Trollier, Thierry; Tanchon, Julien; Ravex, Alain; Daniel, Christophe
2012-06-01
Weight and size are important features of a cryocooler when it comes to space applications. Given their reliability and low level of exported vibrations (due to the absence of moving cold parts), pulse tubes are good candidates for spatial purposes and their miniaturization has been the focus of many studies. We report on the design and performance of a small-scale very high frequency pulse tube prototype, modeled after two previous prototypes which were optimized with a numerical code.
NASA Astrophysics Data System (ADS)
Jung, Sang-Young
Design procedures for aircraft wing structures with control surfaces are presented using multidisciplinary design optimization. Several disciplines such as stress analysis, structural vibration, aerodynamics, and controls are considered simultaneously and combined for design optimization. Vibration data and aerodynamic data including those in the transonic regime are calculated by existing codes. Flutter analyses are performed using those data. A flutter suppression method is studied using control laws in the closed-loop flutter equation. For the design optimization, optimization techniques such as approximation, design variable linking, temporary constraint deletion, and optimality criteria are used. Sensitivity derivatives of stresses and displacements for static loads, natural frequency, flutter characteristics, and control characteristics with respect to design variables are calculated for an approximate optimization. The objective function is the structural weight. The design variables are the section properties of the structural elements and the control gain factors. Existing multidisciplinary optimization codes (ASTROS* and MSC/NASTRAN) are used to perform single and multiple constraint optimizations of fully built up finite element wing structures. Three benchmark wing models are developed and/or modified for this purpose. The models are tested extensively.
Optimization of an electrokinetic mixer for microfluidic applications.
Bockelmann, Hendryk; Heuveline, Vincent; Barz, Dominik P J
2012-06-01
This work is concerned with the investigation of the concentration fields in an electrokinetic micromixer and its optimization in order to achieve high mixing rates. The mixing concept is based on the combination of an alternating electrical excitation applied to a pressure-driven base flow in a meandering microchannel geometry. The electrical excitation induces a secondary electrokinetic velocity component, which results in a complex flow field within the meander bends. A mathematical model describing the physicochemical phenomena present within the micromixer is implemented in an in-house finite-element-method code. We first perform simulations comparable to experiments concerned with the investigation of the flow field in the bends. The comparison of the complex flow topology found in simulation and experiment reveals excellent agreement. Hence, the validated model and numerical schemes are employed for a numerical optimization of the micromixer performance. In detail, we optimize the secondary electrokinetic flow by finding the best electrical excitation parameters, i.e., frequency and amplitude, for a given waveform. Two optimized electrical excitations featuring a discrete and a continuous waveform are discussed with respect to characteristic time scales of our mixing problem. The results demonstrate that the micromixer is able to achieve high mixing degrees very rapidly.
Optimization of an electrokinetic mixer for microfluidic applications
Bockelmann, Hendryk; Heuveline, Vincent; Barz, Dominik P. J.
2012-01-01
This work is concerned with the investigation of the concentration fields in an electrokinetic micromixer and its optimization in order to achieve high mixing rates. The mixing concept is based on the combination of an alternating electrical excitation applied to a pressure-driven base flow in a meandering microchannel geometry. The electrical excitation induces a secondary electrokinetic velocity component, which results in a complex flow field within the meander bends. A mathematical model describing the physicochemical phenomena present within the micromixer is implemented in an in-house finite-element-method code. We first perform simulations comparable to experiments concerned with the investigation of the flow field in the bends. The comparison of the complex flow topology found in simulation and experiment reveals excellent agreement. Hence, the validated model and numerical schemes are employed for a numerical optimization of the micromixer performance. In detail, we optimize the secondary electrokinetic flow by finding the best electrical excitation parameters, i.e., frequency and amplitude, for a given waveform. Two optimized electrical excitations featuring a discrete and a continuous waveform are discussed with respect to characteristic time scales of our mixing problem. The results demonstrate that the micromixer is able to achieve high mixing degrees very rapidly. PMID:22712034
Optimizing Aspect-Oriented Mechanisms for Embedded Applications
NASA Astrophysics Data System (ADS)
Hundt, Christine; Stöhr, Daniel; Glesner, Sabine
As applications for small embedded mobile devices are getting larger and more complex, it becomes inevitable to adopt more advanced software engineering methods from the field of desktop application development. Aspect-oriented programming (AOP) is a promising approach due to its advanced modularization capabilities. However, existing AOP languages tend to add a substantial overhead in both execution time and code size which restricts their practicality for small devices with limited resources. In this paper, we present optimizations for aspect-oriented mechanisms at the level of the virtual machine. Our experiments show that these optimizations yield a considerable performance gain along with a reduction of the code size. Thus, our optimizations establish the base for using advanced aspect-oriented modularization techniques for developing Java applications on small embedded devices.
Surveying multidisciplinary aspects in real-time distributed coding for Wireless Sensor Networks.
Braccini, Carlo; Davoli, Franco; Marchese, Mario; Mongelli, Maurizio
2015-01-27
Wireless Sensor Networks (WSNs), where a multiplicity of sensors observe a physical phenomenon and transmit their measurements to one or more sinks, pertain to the class of multi-terminal source and channel coding problems of Information Theory. In this category, "real-time" coding is often encountered for WSNs, referring to the problem of finding the minimum distortion (according to a given measure), under transmission power constraints, attainable by encoding and decoding functions, with stringent limits on delay and complexity. On the other hand, the Decision Theory approach seeks to determine the optimal coding/decoding strategies or some of their structural properties. Since encoder(s) and decoder(s) possess different information, though sharing a common goal, the setting here is that of Team Decision Theory. A more pragmatic vision rooted in Signal Processing consists of fixing the form of the coding strategies (e.g., to linear functions) and, consequently, finding the corresponding optimal decoding strategies and the achievable distortion, generally by applying parametric optimization techniques. All approaches have a long history of past investigations and recent results. The goal of the present paper is to provide the taxonomy of the various formulations, a survey of the vast related literature, examples from the authors' own research, and some highlights on the inter-play of the different theories.
NASA Astrophysics Data System (ADS)
Panda, Satyasen
2018-05-01
This paper proposes a modified artificial bee colony optimization (ABC) algorithm based on levy flight swarm intelligence referred as artificial bee colony levy flight stochastic walk (ABC-LFSW) optimization for optical code division multiple access (OCDMA) network. The ABC-LFSW algorithm is used to solve asset assignment problem based on signal to noise ratio (SNR) optimization in OCDM networks with quality of service constraints. The proposed optimization using ABC-LFSW algorithm provides methods for minimizing various noises and interferences, regulating the transmitted power and optimizing the network design for improving the power efficiency of the optical code path (OCP) from source node to destination node. In this regard, an optical system model is proposed for improving the network performance with optimized input parameters. The detailed discussion and simulation results based on transmitted power allocation and power efficiency of OCPs are included. The experimental results prove the superiority of the proposed network in terms of power efficiency and spectral efficiency in comparison to networks without any power allocation approach.
NASA Astrophysics Data System (ADS)
Zhao, Hui; Wei, Jingxuan
2014-09-01
The key to the concept of tunable wavefront coding lies in detachable phase masks. Ojeda-Castaneda et al. (Progress in Electronics Research Symposium Proceedings, Cambridge, USA, July 5-8, 2010) described a typical design in which two components with cosinusoidal phase variation operate together to make defocus sensitivity tunable. The present study proposes an improved design and makes three contributions: (1) A mathematical derivation based on the stationary phase method explains why the detachable phase mask of Ojeda-Castaneda et al. tunes the defocus sensitivity. (2) The mathematical derivations show that the effective bandwidth wavefront coded imaging system is also tunable by making each component of the detachable phase mask move asymmetrically. An improved Fisher information-based optimization procedure was also designed to ascertain the optimal mask parameters corresponding to specific bandwidth. (3) Possible applications of the tunable bandwidth are demonstrated by simulated imaging.
User's manual for the BNW-II optimization code for dry/wet-cooled power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Braun, D.J.; Bamberger, J.A.; Braun, D.J.
1978-05-01
The User's Manual describes how to operate BNW-II, a computer code developed by the Pacific Northwest Laboratory (PNL) as a part of its activities under the Department of Energy (DOE) Dry Cooling Enhancement Program. The computer program offers a comprehensive method of evaluating the cost savings potential of dry/wet-cooled heat rejection systems. Going beyond simple ''figure-of-merit'' cooling tower optimization, this method includes such items as the cost of annual replacement capacity, and the optimum split between plant scale-up and replacement capacity, as well as the purchase and operating costs of all major heat rejection components. Hence the BNW-II code ismore » a useful tool for determining potential cost savings of new dry/wet surfaces, new piping, or other components as part of an optimized system for a dry/wet-cooled plant.« less
Navy Recruit Optimization, Post-1980: Separation Process.
1981-04-01
3850260 RGC Death 5030420 RGD Pregnancy 3850220.2 RGE Enuresis 3850220.1m RGF Sleepwalking 3850220.1m F. OTHER RXA Miscellaneous Reasons not covered by...codes--Erroneous Enlistment (RGA), Minority (RGB), Death (RGC), Pregnancy (RGD), Enuresis (RGE), Sleepwalking (RGF)--exhibit a high degree of...difficult to diagnose precisely as medical or mental; e.g., enuresis, sleepwalking . TABLE 6. COMPARISON OF MEDICAL/PSYCHOLOGICAL SEPARATIONS ACROSS RTCs
Optimization of solenoid based low energy beam transport line for high current H+ beams
NASA Astrophysics Data System (ADS)
Pande, R.; Singh, P.; Rao, S. V. L. S.; Roy, S.; Krishnagopal, S.
2015-02-01
A 20 MeV, 30 mA CW proton linac is being developed at BARC, Mumbai. This linac will consist of an ECR ion source followed by a Radio Frequency Quadrupole (RFQ) and Drift tube Linac (DTL). The low energy beam transport (LEBT) line is used to match the beam from the ion source to the RFQ with minimum beam loss and increase in emittance. The LEBT is also used to eliminate the unwanted ions like H2+ and H3+ from entering the RFQ. In addition, space charge compensation is required for transportation of such high beam currents. All this requires careful design and optimization. Detailed beam dynamics simulations have been done to optimize the design of the LEBT using the Particle-in-cell code TRACEWIN. We find that with careful optimization it is possible to transport a 30 mA CW proton beam through the LEBT with 100% transmission and minimal emittance blow up, while at the same time suppressing unwanted species H2+ and H3+ to less than 3.3% of the total beam current.
Long distance quantum communication with quantum Reed-Solomon codes
NASA Astrophysics Data System (ADS)
Muralidharan, Sreraman; Zou, Chang-Ling; Li, Linshu; Jiang, Liang; Jianggroup Team
We study the construction of quantum Reed Solomon codes from classical Reed Solomon codes and show that they achieve the capacity of quantum erasure channel for multi-level quantum systems. We extend the application of quantum Reed Solomon codes to long distance quantum communication, investigate the local resource overhead needed for the functioning of one-way quantum repeaters with these codes, and numerically identify the parameter regime where these codes perform better than the known quantum polynomial codes and quantum parity codes . Finally, we discuss the implementation of these codes into time-bin photonic states of qubits and qudits respectively, and optimize the performance for one-way quantum repeaters.
Putting Priors in Mixture Density Mercer Kernels
NASA Technical Reports Server (NTRS)
Srivastava, Ashok N.; Schumann, Johann; Fischer, Bernd
2004-01-01
This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels to learn kernel function directly from data, rather than using predefined kernels. These data adaptive kernels can en- code prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS). The code for these experiments has been generated with the AUTOBAYES tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains template for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic- algebraic computations, which allows AUTOBAYES to find closed-form solutions and, where possible, to integrate them into the code. The results show that the Mixture Density Mercer-Kernel described here outperforms tree-based classification in distinguishing high-redshift galaxies from low- redshift galaxies by approximately 16% on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%.
Entropy-Based Bounds On Redundancies Of Huffman Codes
NASA Technical Reports Server (NTRS)
Smyth, Padhraic J.
1992-01-01
Report presents extension of theory of redundancy of binary prefix code of Huffman type which includes derivation of variety of bounds expressed in terms of entropy of source and size of alphabet. Recent developments yielded bounds on redundancy of Huffman code in terms of probabilities of various components in source alphabet. In practice, redundancies of optimal prefix codes often closer to 0 than to 1.
Liner Optimization Studies Using the Ducted Fan Noise Prediction Code TBIEM3D
NASA Technical Reports Server (NTRS)
Dunn, M. H.; Farassat, F.
1998-01-01
In this paper we demonstrate the usefulness of the ducted fan noise prediction code TBIEM3D as a liner optimization design tool. Boundary conditions on the interior duct wall allow for hard walls or a locally reacting liner with axially segmented, circumferentially uniform impedance. Two liner optimization studies are considered in which farfield noise attenuation due to the presence of a liner is maximized by adjusting the liner impedance. In the first example, the dependence of optimal liner impedance on frequency and liner length is examined. Results show that both the optimal impedance and attenuation levels are significantly influenced by liner length and frequency. In the second example, TBIEM3D is used to compare radiated sound pressure levels between optimal and non-optimal liner cases at conditions designed to simulate take-off. It is shown that significant noise reduction is achieved for most of the sound field by selecting the optimal or near optimal liner impedance. Our results also indicate that there is relatively large region of the impedance plane over which optimal or near optimal liner behavior is attainable. This is an important conclusion for the designer since there are variations in liner characteristics due to manufacturing imprecisions.
Modeling of High Speed Reacting Flows: Established Practices and Future Challenges
NASA Technical Reports Server (NTRS)
Baurle, R. A.
2004-01-01
Computational fluid dynamics (CFD) has proven to be an invaluable tool for the design and analysis of high- speed propulsion devices. Massively parallel computing, together with the maturation of robust CFD codes, has made it possible to perform simulations of complete engine flowpaths. Steady-state Reynolds-Averaged Navier-Stokes simulations are now routinely used in the scramjet engine development cycle to determine optimal fuel injector arrangements, investigate trends noted during testing, and extract various measures of engine efficiency. Unfortunately, the turbulence and combustion models used in these codes have not changed significantly over the past decade. Hence, the CFD practitioner must often rely heavily on existing measurements (at similar flow conditions) to calibrate model coefficients on a case- by-case basis. This paper provides an overview of the modeled equations typically employed by commercial- quality CFD codes for high-speed combustion applications. Careful attention is given to the approximations employed for each of the unclosed terms in the averaged equation set. The salient features (and shortcomings) of common models used to close these terms are covered in detail, and several academic efforts aimed at addressing these shortcomings are discussed.
Software Developed for Analyzing High- Speed Rolling-Element Bearings
NASA Technical Reports Server (NTRS)
Fleming, David P.
2005-01-01
COBRA-AHS (Computer Optimized Ball & Roller Bearing Analysis--Advanced High Speed, J.V. Poplawski & Associates, Bethlehem, PA) is used for the design and analysis of rolling element bearings operating at high speeds under complex mechanical and thermal loading. The code estimates bearing fatigue life by calculating three-dimensional subsurface stress fields developed within the bearing raceways. It provides a state-of-the-art interactive design environment for bearing engineers within a single easy-to-use design-analysis package. The code analyzes flexible or rigid shaft systems containing up to five bearings acted upon by radial, thrust, and moment loads in 5 degrees of freedom. Bearing types include high-speed ball, cylindrical roller, and tapered roller bearings. COBRA-AHS is the first major upgrade in 30 years of such commercially available bearing software. The upgrade was developed under a Small Business Innovation Research contract from the NASA Glenn Research Center, and incorporates the results of 30 years of NASA and industry bearing research and technology.
Behavior and neural basis of near-optimal visual search
Ma, Wei Ji; Navalpakkam, Vidhya; Beck, Jeffrey M; van den Berg, Ronald; Pouget, Alexandre
2013-01-01
The ability to search efficiently for a target in a cluttered environment is one of the most remarkable functions of the nervous system. This task is difficult under natural circumstances, as the reliability of sensory information can vary greatly across space and time and is typically a priori unknown to the observer. In contrast, visual-search experiments commonly use stimuli of equal and known reliability. In a target detection task, we randomly assigned high or low reliability to each item on a trial-by-trial basis. An optimal observer would weight the observations by their trial-to-trial reliability and combine them using a specific nonlinear integration rule. We found that humans were near-optimal, regardless of whether distractors were homogeneous or heterogeneous and whether reliability was manipulated through contrast or shape. We present a neural-network implementation of near-optimal visual search based on probabilistic population coding. The network matched human performance. PMID:21552276
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wemhoff, A P; Burnham, A K
2006-04-05
Cross-comparison of the results of two computer codes for the same problem provides a mutual validation of their computational methods. This cross-validation exercise was performed for LLNL's ALE3D code and AKTS's Thermal Safety code, using the thermal ignition of HMX in two standard LLNL cookoff experiments: the One-Dimensional Time to Explosion (ODTX) test and the Scaled Thermal Explosion (STEX) test. The chemical kinetics model used in both codes was the extended Prout-Tompkins model, a relatively new addition to ALE3D. This model was applied using ALE3D's new pseudospecies feature. In addition, an advanced isoconversional kinetic approach was used in the AKTSmore » code. The mathematical constants in the Prout-Tompkins code were calibrated using DSC data from hermetically sealed vessels and the LLNL optimization code Kinetics05. The isoconversional kinetic parameters were optimized using the AKTS Thermokinetics code. We found that the Prout-Tompkins model calculations agree fairly well between the two codes, and the isoconversional kinetic model gives very similar results as the Prout-Tompkins model. We also found that an autocatalytic approach in the beta-delta phase transition model does affect the times to explosion for some conditions, especially STEX-like simulations at ramp rates above 100 C/hr, and further exploration of that effect is warranted.« less
Multivariable optimization of liquid rocket engines using particle swarm algorithms
NASA Astrophysics Data System (ADS)
Jones, Daniel Ray
Liquid rocket engines are highly reliable, controllable, and efficient compared to other conventional forms of rocket propulsion. As such, they have seen wide use in the space industry and have become the standard propulsion system for launch vehicles, orbit insertion, and orbital maneuvering. Though these systems are well understood, historical optimization techniques are often inadequate due to the highly non-linear nature of the engine performance problem. In this thesis, a Particle Swarm Optimization (PSO) variant was applied to maximize the specific impulse of a finite-area combustion chamber (FAC) equilibrium flow rocket performance model by controlling the engine's oxidizer-to-fuel ratio and de Laval nozzle expansion and contraction ratios. In addition to the PSO-controlled parameters, engine performance was calculated based on propellant chemistry, combustion chamber pressure, and ambient pressure, which are provided as inputs to the program. The performance code was validated by comparison with NASA's Chemical Equilibrium with Applications (CEA) and the commercially available Rocket Propulsion Analysis (RPA) tool. Similarly, the PSO algorithm was validated by comparison with brute-force optimization, which calculates all possible solutions and subsequently determines which is the optimum. Particle Swarm Optimization was shown to be an effective optimizer capable of quick and reliable convergence for complex functions of multiple non-linear variables.
Design optimization of beta- and photovoltaic conversion devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wichner, R.; Blum, A.; Fischer-Colbrie, E.
1976-01-08
This report presents the theoretical and experimental results of an LLL Electronics Engineering research program aimed at optimizing the design and electronic-material parameters of beta- and photovoltaic p-n junction conversion devices. To meet this objective, a comprehensive computer code has been developed that can handle a broad range of practical conditions. The physical model upon which the code is based is described first. Then, an example is given of a set of optimization calculations along with the resulting optimized efficiencies for silicon (Si) and gallium-arsenide (GaAs) devices. The model we have developed, however, is not limited to these materials. Itmore » can handle any appropriate material--single or polycrystalline-- provided energy absorption and electron-transport data are available. To check code validity, the performance of experimental silicon p-n junction devices (produced in-house) were measured under various light intensities and spectra as well as under tritium beta irradiation. The results of these tests were then compared with predicted results based on the known or best estimated device parameters. The comparison showed very good agreement between the calculated and the measured results.« less
Optimizing legacy molecular dynamics software with directive-based offload
NASA Astrophysics Data System (ADS)
Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.
2015-10-01
Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
NASA Astrophysics Data System (ADS)
Teplukhina, A. A.; Sauter, O.; Felici, F.; Merle, A.; Kim, D.; the TCV Team; the ASDEX Upgrade Team; the EUROfusion MST1 Team
2017-12-01
The present work demonstrates the capabilities of the transport code RAPTOR as a fast and reliable simulator of plasma profiles for the entire plasma discharge, i.e. from ramp-up to ramp-down. This code focuses, at this stage, on the simulation of electron temperature and poloidal flux profiles using prescribed equilibrium and some kinetic profiles. In this work we extend the RAPTOR transport model to include a time-varying plasma equilibrium geometry and verify the changes via comparison with ATSRA code simulations. In addition a new ad hoc transport model based on constant gradients and suitable for simulations of L-H and H-L mode transitions has been incorporated into the RAPTOR code and validated with rapid simulations of the time evolution of the safety factor and the electron temperature over the entire AUG and TCV discharges. An optimization procedure for the plasma termination phase has also been developed during this work. We define the goal of the optimization as ramping down the plasma current as fast as possible while avoiding any disruptions caused by reaching physical or technical limits. Our numerical study of this problem shows that a fast decrease of plasma elongation during current ramp-down can help in reducing plasma internal inductance. An early transition from H- to L-mode allows us to reduce the drop in poloidal beta, which is also important for plasma MHD stability and control. This work shows how these complex nonlinear interactions can be optimized automatically using relevant cost functions and constraints. Preliminary experimental results for TCV are demonstrated.
Domain Specific Language Support for Exascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadayappan, Ponnuswamy
Domain-Specific Languages (DSLs) offer an attractive path to Exascale software since they provide expressive power through appropriate abstractions and enable domain-specific optimizations. But the advantages of a DSL compete with the difficulties of implementing a DSL, even for a narrowly defined domain. The DTEC project addresses how a variety of DSLs can be easily implemented to leverage existing compiler analysis and transformation capabilities within the ROSE open source compiler as part of a research program focusing on Exascale challenges. The OSU contributions to the DTEC project are in the area of code generation from high-level DSL descriptions, as well asmore » verification of the automatically-generated code.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu, Pengchen; Settgast, Randolph R.; Johnson, Scott M.
2014-12-17
GEOS is a massively parallel, multi-physics simulation application utilizing high performance computing (HPC) to address subsurface reservoir stimulation activities with the goal of optimizing current operations and evaluating innovative stimulation methods. GEOS enables coupling of di erent solvers associated with the various physical processes occurring during reservoir stimulation in unique and sophisticated ways, adapted to various geologic settings, materials and stimulation methods. Developed at the Lawrence Livermore National Laboratory (LLNL) as a part of a Laboratory-Directed Research and Development (LDRD) Strategic Initiative (SI) project, GEOS represents the culmination of a multi-year ongoing code development and improvement e ort that hasmore » leveraged existing code capabilities and sta expertise to design new computational geosciences software.« less
NASA Technical Reports Server (NTRS)
Scott, Elaine P.
1993-01-01
Thermal stress analyses are an important aspect in the development of aerospace vehicles such as the National Aero-Space Plane (NASP) and the High-Speed Civil Transport (HSCT) at NASA-LaRC. These analyses require knowledge of the temperature within the structures which consequently necessitates the need for thermal property data. The initial goal of this research effort was to develop a methodology for the estimation of thermal properties of aerospace structural materials at room temperature and to develop a procedure to optimize the estimation process. The estimation procedure was implemented utilizing a general purpose finite element code. In addition, an optimization procedure was developed and implemented to determine critical experimental parameters to optimize the estimation procedure. Finally, preliminary experiments were conducted at the Aircraft Structures Branch (ASB) laboratory.
NASA Astrophysics Data System (ADS)
Scott, Elaine P.
1993-12-01
Thermal stress analyses are an important aspect in the development of aerospace vehicles such as the National Aero-Space Plane (NASP) and the High-Speed Civil Transport (HSCT) at NASA-LaRC. These analyses require knowledge of the temperature within the structures which consequently necessitates the need for thermal property data. The initial goal of this research effort was to develop a methodology for the estimation of thermal properties of aerospace structural materials at room temperature and to develop a procedure to optimize the estimation process. The estimation procedure was implemented utilizing a general purpose finite element code. In addition, an optimization procedure was developed and implemented to determine critical experimental parameters to optimize the estimation procedure. Finally, preliminary experiments were conducted at the Aircraft Structures Branch (ASB) laboratory.
Pattern-based integer sample motion search strategies in the context of HEVC
NASA Astrophysics Data System (ADS)
Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas
2015-09-01
The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
NASA Technical Reports Server (NTRS)
Geiselhart, Karl A.; Ozoroski, Lori P.; Fenbert, James W.; Shields, Elwood W.; Li, Wu
2011-01-01
This paper documents the development of a conceptual level integrated process for design and analysis of efficient and environmentally acceptable supersonic aircraft. To overcome the technical challenges to achieve this goal, a conceptual design capability which provides users with the ability to examine the integrated solution between all disciplines and facilitates the application of multidiscipline design, analysis, and optimization on a scale greater than previously achieved, is needed. The described capability is both an interactive design environment as well as a high powered optimization system with a unique blend of low, mixed and high-fidelity engineering tools combined together in the software integration framework, ModelCenter. The various modules are described and capabilities of the system are demonstrated. The current limitations and proposed future enhancements are also discussed.
NASA Astrophysics Data System (ADS)
Abdellah, Skoudarli; Mokhtar, Nibouche; Amina, Serir
2015-11-01
The H.264/AVC video coding standard is used in a wide range of applications from video conferencing to high-definition television according to its high compression efficiency. This efficiency is mainly acquired from the newly allowed prediction schemes including variable block modes. However, these schemes require a high complexity to select the optimal mode. Consequently, complexity reduction in the H.264/AVC encoder has recently become a very challenging task in the video compression domain, especially when implementing the encoder in real-time applications. Fast mode decision algorithms play an important role in reducing the overall complexity of the encoder. In this paper, we propose an adaptive fast intermode algorithm based on motion activity, temporal stationarity, and spatial homogeneity. This algorithm predicts the motion activity of the current macroblock from its neighboring blocks and identifies temporal stationary regions and spatially homogeneous regions using adaptive threshold values based on content video features. Extensive experimental work has been done in high profile, and results show that the proposed source-coding algorithm effectively reduces the computational complexity by 53.18% on average compared with the reference software encoder, while maintaining the high-coding efficiency of H.264/AVC by incurring only 0.097 dB in total peak signal-to-noise ratio and 0.228% increment on the total bit rate.
NASA Astrophysics Data System (ADS)
Lee, Eun Seok
2000-10-01
An improved aerodynamics performance of a turbine cascade shape can be achieved by an understanding of the flow-field associated with the stator-rotor interaction. In this research, an axial gas turbine airfoil cascade shape is optimized for improved aerodynamic performance by using an unsteady Navier-Stokes solver and a parallel genetic algorithm. The objective of the research is twofold: (1) to develop a computational fluid dynamics code having faster convergence rate and unsteady flow simulation capabilities, and (2) to optimize a turbine airfoil cascade shape with unsteady passing wakes for improved aerodynamic performance. The computer code solves the Reynolds averaged Navier-Stokes equations. It is based on the explicit, finite difference, Runge-Kutta time marching scheme and the Diagonalized Alternating Direction Implicit (DADI) scheme, with the Baldwin-Lomax algebraic and k-epsilon turbulence modeling. Improvements in the code focused on the cascade shape design capability, convergence acceleration and unsteady formulation. First, the inverse shape design method was implemented in the code to provide the design capability, where a surface transpiration concept was employed as an inverse technique to modify the geometry satisfying the user specified pressure distribution on the airfoil surface. Second, an approximation storage multigrid method was implemented as an acceleration technique. Third, the preconditioning method was adopted to speed up the convergence rate in solving the low Mach number flows. Finally, the implicit dual time stepping method was incorporated in order to simulate the unsteady flow-fields. For the unsteady code validation, the Stokes's 2nd problem and the Poiseuille flow were chosen and compared with the computed results and analytic solutions. To test the code's ability to capture the natural unsteady flow phenomena, vortex shedding past a cylinder and the shock oscillation over a bicircular airfoil were simulated and compared with experiments and other research results. The rotor cascade shape optimization with unsteady passing wakes was performed to obtain an improved aerodynamic performance using the unsteady Navier-Stokes solver. Two objective functions were defined as minimization of total pressure loss and maximization of lift, while the mass flow rate was fixed. A parallel genetic algorithm was used as an optimizer and the penalty method was introduced. Each individual's objective function was computed simultaneously by using a 32 processor distributed memory computer. One optimization took about four days.
Finite element analysis of wirelessly interrogated implantable bio-MEMS
NASA Astrophysics Data System (ADS)
Dissanayake, Don W.; Al-Sarawi, Said F.; Lu, Tien-Fu; Abbott, Derek
2008-12-01
Wirelessly interrogated bio-MEMS devices are becoming more popular due to many challenges, such as improving the diagnosis, monitoring, and patient wellbeing. The authors present here a passive, low power and small area device, which can be interrogated wirelessly using a uniquely coded signal for a secure and reliable operation. The proposed new approach relies on converting the interrogating coded signal to surface acoustic wave that is then correlated with an embedded code. The suggested method is implemented to operate a micropump, which consist of a specially designed corrugated microdiaphragm to modulate the fluid flow in microchannels. Finite Element Analysis of the micropump operation is presented and a performance was analysed. Design parameters of the diaphragm design were finetuned for optimal performance and different polymer based materials were used in various parts of the micropump to allow for better flexibility and high reliability.
Development of Fuel Shuffling Module for PHISICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allan Mabe; Andrea Alfonsi; Cristian Rabiti
2013-06-01
PHISICS (Parallel and Highly Innovative Simulation for the INL Code System) [4] code toolkit has been in development at the Idaho National Laboratory. This package is intended to provide a modern analysis tool for reactor physics investigation. It is designed with the mindset to maximize accuracy for a given availability of computational resources and to give state of the art tools to the modern nuclear engineer. This is obtained by implementing several different algorithms and meshing approaches among which the user will be able to choose, in order to optimize his computational resources and accuracy needs. The software is completelymore » modular in order to simplify the independent development of modules by different teams and future maintenance. The package is coupled with the thermo-hydraulic code RELAP5-3D [3]. In the following the structure of the different PHISICS modules is briefly recalled, focusing on the new shuffling module (SHUFFLE), object of this paper.« less
SOC-DS computer code provides tool for design evaluation of homogeneous two-material nuclear shield
NASA Technical Reports Server (NTRS)
Disney, R. K.; Ricks, L. O.
1967-01-01
SOC-DS Code /Shield Optimization Code-Direc Search/, selects a nuclear shield material of optimum volume, weight, or cost to meet the requirments of a given radiation dose rate or energy transmission constraint. It is applicable to evaluating neutron and gamma ray shields for all nuclear reactors.
Integration of Dakota into the NEAMS Workbench
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Lefebvre, Robert A.; Langley, Brandon R.
2017-07-01
This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on integrating Dakota into the NEAMS Workbench. The NEAMS Workbench, developed at Oak Ridge National Laboratory, is a new software framework that provides a graphical user interface, input file creation, parsing, validation, job execution, workflow management, and output processing for a variety of nuclear codes. Dakota is a tool developed at Sandia National Laboratories that provides a suite of uncertainty quantification and optimization algorithms. Providing Dakota within the NEAMS Workbench allows users of nuclear simulation codes to perform uncertainty and optimization studies on their nuclear codes frommore » within a common, integrated environment. Details of the integration and parsing are provided, along with an example of Dakota running a sampling study on the fuels performance code, BISON, from within the NEAMS Workbench.« less
Efficient transformation of an auditory population code in a small sensory system.
Clemens, Jan; Kutzki, Olaf; Ronacher, Bernhard; Schreiber, Susanne; Wohlgemuth, Sandra
2011-08-16
Optimal coding principles are implemented in many large sensory systems. They include the systematic transformation of external stimuli into a sparse and decorrelated neuronal representation, enabling a flexible readout of stimulus properties. Are these principles also applicable to size-constrained systems, which have to rely on a limited number of neurons and may only have to fulfill specific and restricted tasks? We studied this question in an insect system--the early auditory pathway of grasshoppers. Grasshoppers use genetically fixed songs to recognize mates. The first steps of neural processing of songs take place in a small three-layer feed-forward network comprising only a few dozen neurons. We analyzed the transformation of the neural code within this network. Indeed, grasshoppers create a decorrelated and sparse representation, in accordance with optimal coding theory. Whereas the neuronal input layer is best read out as a summed population, a labeled-line population code for temporal features of the song is established after only two processing steps. At this stage, information about song identity is maximal for a population decoder that preserves neuronal identity. We conclude that optimal coding principles do apply to the early auditory system of the grasshopper, despite its size constraints. The inputs, however, are not encoded in a systematic, map-like fashion as in many larger sensory systems. Already at its periphery, part of the grasshopper auditory system seems to focus on behaviorally relevant features, and is in this property more reminiscent of higher sensory areas in vertebrates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Brian M.; Ebeida, Mohamed Salah; Eldred, Michael S.
The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components requiredmore » for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.« less
Multidisciplinary optimization of controlled space structures with global sensitivity equations
NASA Technical Reports Server (NTRS)
Padula, Sharon L.; James, Benjamin B.; Graves, Philip C.; Woodard, Stanley E.
1991-01-01
A new method for the preliminary design of controlled space structures is presented. The method coordinates standard finite element structural analysis, multivariable controls, and nonlinear programming codes and allows simultaneous optimization of the structures and control systems of a spacecraft. Global sensitivity equations are a key feature of this method. The preliminary design of a generic geostationary platform is used to demonstrate the multidisciplinary optimization method. Fifteen design variables are used to optimize truss member sizes and feedback gain values. The goal is to reduce the total mass of the structure and the vibration control system while satisfying constraints on vibration decay rate. Incorporating the nonnegligible mass of actuators causes an essential coupling between structural design variables and control design variables. The solution of the demonstration problem is an important step toward a comprehensive preliminary design capability for structures and control systems. Use of global sensitivity equations helps solve optimization problems that have a large number of design variables and a high degree of coupling between disciplines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2017-08-04
This code is an enhancement to the existing FLORIS code, SWR 14-20. In particular, this enhancement computes overall thrust and turbulence intensity throughout a wind plant. This information is used to form a description of the fatigue loads experienced throughtout the wind plant. FLORIS has been updated to include an optimization routine that optimizes FLORIS to minimize thrust and turbulence intensity (and therefore loads) across the wind plant. Previously, FLORIS had been designed to optimize power out of a wind plant. However, as turbines age, more wind plant owner/operators are looking for ways to reduce their fatigue loads without sacrificingmore » too much power.« less
NASA Technical Reports Server (NTRS)
Johnson, Theodore F.; Waters, W. Allen; Singer, Thomas N.; Haftka, Raphael T.
2004-01-01
A next generation reusable launch vehicle (RLV) will require thermally efficient and light-weight cryogenic propellant tank structures. Since these tanks will be weight-critical, analytical tools must be developed to aid in sizing the thickness of insulation layers and structural geometry for optimal performance. Finite element method (FEM) models of the tank and insulation layers were created to analyze the thermal performance of the cryogenic insulation layer and thermal protection system (TPS) of the tanks. The thermal conditions of ground-hold and re-entry/soak-through for a typical RLV mission were used in the thermal sizing study. A general-purpose nonlinear FEM analysis code, capable of using temperature and pressure dependent material properties, was used as the thermal analysis code. Mechanical loads from ground handling and proof-pressure testing were used to size the structural geometry of an aluminum cryogenic tank wall. Nonlinear deterministic optimization and reliability optimization techniques were the analytical tools used to size the geometry of the isogrid stiffeners and thickness of the skin. The results from the sizing study indicate that a commercial FEM code can be used for thermal analyses to size the insulation thicknesses where the temperature and pressure were varied. The results from the structural sizing study show that using combined deterministic and reliability optimization techniques can obtain alternate and lighter designs than the designs obtained from deterministic optimization methods alone.
Scalable video transmission over Rayleigh fading channels using LDPC codes
NASA Astrophysics Data System (ADS)
Bansal, Manu; Kondi, Lisimachos P.
2005-03-01
In this paper, we investigate an important problem of efficiently utilizing the available resources for video transmission over wireless channels while maintaining a good decoded video quality and resilience to channel impairments. Our system consists of the video codec based on 3-D set partitioning in hierarchical trees (3-D SPIHT) algorithm and employs two different schemes using low-density parity check (LDPC) codes for channel error protection. The first method uses the serial concatenation of the constant-rate LDPC code and rate-compatible punctured convolutional (RCPC) codes. Cyclic redundancy check (CRC) is used to detect transmission errors. In the other scheme, we use the product code structure consisting of a constant rate LDPC/CRC code across the rows of the `blocks' of source data and an erasure-correction systematic Reed-Solomon (RS) code as the column code. In both the schemes introduced here, we use fixed-length source packets protected with unequal forward error correction coding ensuring a strictly decreasing protection across the bitstream. A Rayleigh flat-fading channel with additive white Gaussian noise (AWGN) is modeled for the transmission. The rate-distortion optimization algorithm is developed and carried out for the selection of source coding and channel coding rates using Lagrangian optimization. The experimental results demonstrate the effectiveness of this system under different wireless channel conditions and both the proposed methods (LDPC+RCPC/CRC and RS+LDPC/CRC) outperform the more conventional schemes such as those employing RCPC/CRC.
NASA Astrophysics Data System (ADS)
Chen, Zhenzhong; Han, Junwei; Ngan, King Ngi
2005-10-01
MPEG-4 treats a scene as a composition of several objects or so-called video object planes (VOPs) that are separately encoded and decoded. Such a flexible video coding framework makes it possible to code different video object with different distortion scale. It is necessary to analyze the priority of the video objects according to its semantic importance, intrinsic properties and psycho-visual characteristics such that the bit budget can be distributed properly to video objects to improve the perceptual quality of the compressed video. This paper aims to provide an automatic video object priority definition method based on object-level visual attention model and further propose an optimization framework for video object bit allocation. One significant contribution of this work is that the human visual system characteristics are incorporated into the video coding optimization process. Another advantage is that the priority of the video object can be obtained automatically instead of fixing weighting factors before encoding or relying on the user interactivity. To evaluate the performance of the proposed approach, we compare it with traditional verification model bit allocation and the optimal multiple video object bit allocation algorithms. Comparing with traditional bit allocation algorithms, the objective quality of the object with higher priority is significantly improved under this framework. These results demonstrate the usefulness of this unsupervised subjective quality lifting framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baumann, K; Weber, U; Simeonov, Y
Purpose: Aim of this study was to optimize the magnetic field strengths of two quadrupole magnets in a particle therapy facility in order to obtain a beam quality suitable for spot beam scanning. Methods: The particle transport through an ion-optic system of a particle therapy facility consisting of the beam tube, two quadrupole magnets and a beam monitor system was calculated with the help of Matlab by using matrices that solve the equation of motion of a charged particle in a magnetic field and field-free region, respectively. The magnetic field strengths were optimized in order to obtain a circular andmore » thin beam spot at the iso-center of the therapy facility. These optimized field strengths were subsequently transferred to the Monte-Carlo code FLUKA and the transport of 80 MeV/u C12-ions through this ion-optic system was calculated by using a user-routine to implement magnetic fields. The fluence along the beam-axis and at the iso-center was evaluated. Results: The magnetic field strengths could be optimized by using Matlab and transferred to the Monte-Carlo code FLUKA. The implementation via a user-routine was successful. Analyzing the fluence-pattern along the beam-axis the characteristic focusing and de-focusing effects of the quadrupole magnets could be reproduced. Furthermore the beam spot at the iso-center was circular and significantly thinner compared to an unfocused beam. Conclusion: In this study a Matlab tool was developed to optimize magnetic field strengths for an ion-optic system consisting of two quadrupole magnets as part of a particle therapy facility. These magnetic field strengths could subsequently be transferred to and implemented in the Monte-Carlo code FLUKA to simulate the particle transport through this optimized ion-optic system.« less
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Optimal superdense coding over memory channels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadman, Z.; Kampermann, H.; Bruss, D.
2011-10-15
We study the superdense coding capacity in the presence of quantum channels with correlated noise. We investigate both the cases of unitary and nonunitary encoding. Pauli channels for arbitrary dimensions are treated explicitly. The superdense coding capacity for some special channels and resource states is derived for unitary encoding. We also provide an example of a memory channel where nonunitary encoding leads to an improvement in the superdense coding capacity.
Advanced launch system trajectory optimization using suboptimal control
NASA Technical Reports Server (NTRS)
Shaver, Douglas A.; Hull, David G.
1993-01-01
The maximum-final mass trajectory of a proposed configuration of the Advanced Launch System is presented. A model for the two-stage rocket is given; the optimal control problem is formulated as a parameter optimization problem; and the optimal trajectory is computed using a nonlinear programming code called VF02AD. Numerical results are presented for the controls (angle of attack and velocity roll angle) and the states. After the initial rotation, the angle of attack goes to a positive value to keep the trajectory as high as possible, returns to near zero to pass through the transonic regime and satisfy the dynamic pressure constraint, returns to a positive value to keep the trajectory high and to take advantage of minimum drag at positive angle of attack due to aerodynamic shading of the booster, and then rolls off to negative values to satisfy the constraints. Because the engines cannot be throttled, the maximum dynamic pressure occurs at a single point; there is no maximum dynamic pressure subarc. To test approximations for obtaining analytical solutions for guidance, two additional optimal trajectories are computed: one using untrimmed aerodynamics and one using no atmospheric effects except for the dynamic pressure constraint. It is concluded that untrimmed aerodynamics has a negligible effect on the optimal trajectory and that approximate optimal controls should be able to be obtained by treating atmospheric effects as perturbations.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
NASA Technical Reports Server (NTRS)
Pindera, Marek-Jerzy; Salzar, Robert S.
1996-01-01
The objective of this work was the development of efficient, user-friendly computer codes for optimizing fabrication-induced residual stresses in metal matrix composites through the use of homogeneous and heterogeneous interfacial layer architectures and processing parameter variation. To satisfy this objective, three major computer codes have been developed and delivered to the NASA-Lewis Research Center, namely MCCM, OPTCOMP, and OPTCOMP2. MCCM is a general research-oriented code for investigating the effects of microstructural details, such as layered morphology of SCS-6 SiC fibers and multiple homogeneous interfacial layers, on the inelastic response of unidirectional metal matrix composites under axisymmetric thermomechanical loading. OPTCOMP and OPTCOMP2 combine the major analysis module resident in MCCM with a commercially-available optimization algorithm and are driven by user-friendly interfaces which facilitate input data construction and program execution. OPTCOMP enables the user to identify those dimensions, geometric arrangements and thermoelastoplastic properties of homogeneous interfacial layers that minimize thermal residual stresses for the specified set of constraints. OPTCOMP2 provides additional flexibility in the residual stress optimization through variation of the processing parameters (time, temperature, external pressure and axial load) as well as the microstructure of the interfacial region which is treated as a heterogeneous two-phase composite. Overviews of the capabilities of these codes are provided together with a summary of results that addresses the effects of various microstructural details of the fiber, interfacial layers and matrix region on the optimization of fabrication-induced residual stresses in metal matrix composites.
GAME: GAlaxy Machine learning for Emission lines
NASA Astrophysics Data System (ADS)
Ucci, G.; Ferrara, A.; Pallottini, A.; Gallerani, S.
2018-06-01
We present an updated, optimized version of GAME (GAlaxy Machine learning for Emission lines), a code designed to infer key interstellar medium physical properties from emission line intensities of ultraviolet /optical/far-infrared galaxy spectra. The improvements concern (a) an enlarged spectral library including Pop III stars, (b) the inclusion of spectral noise in the training procedure, and (c) an accurate evaluation of uncertainties. We extensively validate the optimized code and compare its performance against empirical methods and other available emission line codes (PYQZ and HII-CHI-MISTRY) on a sample of 62 SDSS stacked galaxy spectra and 75 observed HII regions. Very good agreement is found for metallicity. However, ionization parameters derived by GAME tend to be higher. We show that this is due to the use of too limited libraries in the other codes. The main advantages of GAME are the simultaneous use of all the measured spectral lines and the extremely short computational times. We finally discuss the code potential and limitations.
Iterative channel decoding of FEC-based multiple-description codes.
Chang, Seok-Ho; Cosman, Pamela C; Milstein, Laurence B
2012-03-01
Multiple description coding has been receiving attention as a robust transmission framework for multimedia services. This paper studies the iterative decoding of FEC-based multiple description codes. The proposed decoding algorithms take advantage of the error detection capability of Reed-Solomon (RS) erasure codes. The information of correctly decoded RS codewords is exploited to enhance the error correction capability of the Viterbi algorithm at the next iteration of decoding. In the proposed algorithm, an intradescription interleaver is synergistically combined with the iterative decoder. The interleaver does not affect the performance of noniterative decoding but greatly enhances the performance when the system is iteratively decoded. We also address the optimal allocation of RS parity symbols for unequal error protection. For the optimal allocation in iterative decoding, we derive mathematical equations from which the probability distributions of description erasures can be generated in a simple way. The performance of the algorithm is evaluated over an orthogonal frequency-division multiplexing system. The results show that the performance of the multiple description codes is significantly enhanced.
The effect of total noise on two-dimension OCDMA codes
NASA Astrophysics Data System (ADS)
Dulaimi, Layth A. Khalil Al; Badlishah Ahmed, R.; Yaakob, Naimah; Aljunid, Syed A.; Matem, Rima
2017-11-01
In this research, we evaluate the performance of total noise effect on two dimension (2-D) optical code-division multiple access (OCDMA) performance systems using 2-D Modified Double Weight MDW under various link parameters. The impact of the multi-access interference (MAI) and other noise effect on the system performance. The 2-D MDW is compared mathematically with other codes which use similar techniques. We analyzed and optimized the data rate and effective receive power. The performance and optimization of MDW code in OCDMA system are reported, the bit error rate (BER) can be significantly improved when the 2-D MDW code desired parameters are selected especially the cross correlation properties. It reduces the MAI in the system compensate BER and phase-induced intensity noise (PIIN) in incoherent OCDMA The analysis permits a thorough understanding of PIIN, shot and thermal noises impact on 2-D MDW OCDMA system performance. PIIN is the main noise factor in the OCDMA network.
Label consistent K-SVD: learning a discriminative dictionary for recognition.
Jiang, Zhuolin; Lin, Zhe; Davis, Larry S
2013-11-01
A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.
HITEMP Material and Structural Optimization Technology Transfer
NASA Technical Reports Server (NTRS)
Collier, Craig S.; Arnold, Steve (Technical Monitor)
2001-01-01
The feasibility of adding viscoelasticity and the Generalized Method of Cells (GMC) for micromechanical viscoelastic behavior into the commercial HyperSizer structural analysis and optimization code was investigated. The viscoelasticity methodology was developed in four steps. First, a simplified algorithm was devised to test the iterative time stepping method for simple one-dimensional multiple ply structures. Second, GMC code was made into a callable subroutine and incorporated into the one-dimensional code to test the accuracy and usability of the code. Third, the viscoelastic time-stepping and iterative scheme was incorporated into HyperSizer for homogeneous, isotropic viscoelastic materials. Finally, the GMC was included in a version of HyperSizer. MS Windows executable files implementing each of these steps is delivered with this report, as well as source code. The findings of this research are that both viscoelasticity and GMC are feasible and valuable additions to HyperSizer and that the door is open for more advanced nonlinear capability, such as viscoplasticity.
Coded excitation with spectrum inversion (CEXSI) for ultrasound array imaging.
Wang, Yao; Metzger, Kurt; Stephens, Douglas N; Williams, Gregory; Brownlie, Scott; O'Donnell, Matthew
2003-07-01
In this paper, a scheme called coded excitation with spectrum inversion (CEXSI) is presented. An established optimal binary code whose spectrum has no nulls and possesses the least variation is encoded as a burst for transmission. Using this optimal code, the decoding filter can be derived directly from its inverse spectrum. Various transmission techniques can be used to improve energy coupling within the system pass-band. We demonstrate its potential to achieve excellent decoding with very low (< 80 dB) side-lobes. For a 2.6 micros code, an array element with a center frequency of 10 MHz and fractional bandwidth of 38%, range side-lobes of about 40 dB have been achieved experimentally with little compromise in range resolution. The signal-to-noise ratio (SNR) improvement also has been characterized at about 14 dB. Along with simulations and experimental data, we present a formulation of the scheme, according to which CEXSI can be extended to improve SNR in sparse array imaging in general.
Adaptive partially hidden Markov models with application to bilevel image coding.
Forchhammer, S; Rasmussen, T S
1999-01-01
Partially hidden Markov models (PHMMs) have previously been introduced. The transition and emission/output probabilities from hidden states, as known from the HMMs, are conditioned on the past. This way, the HMM may be applied to images introducing the dependencies of the second dimension by conditioning. In this paper, the PHMM is extended to multiple sequences with a multiple token version and adaptive versions of PHMM coding are presented. The different versions of the PHMM are applied to lossless bilevel image coding. To reduce and optimize the model cost and size, the contexts are organized in trees and effective quantization of the parameters is introduced. The new coding methods achieve results that are better than the JBIG standard on selected test images, although at the cost of increased complexity. By the minimum description length principle, the methods presented for optimizing the code length may apply as guidance for training (P)HMMs for, e.g., segmentation or recognition purposes. Thereby, the PHMM models provide a new approach to image modeling.
Optimizing LX-17 Thermal Decomposition Model Parameters with Evolutionary Algorithms
NASA Astrophysics Data System (ADS)
Moore, Jason; McClelland, Matthew; Tarver, Craig; Hsu, Peter; Springer, H. Keo
2017-06-01
We investigate and model the cook-off behavior of LX-17 because this knowledge is critical to understanding system response in abnormal thermal environments. Thermal decomposition of LX-17 has been explored in conventional ODTX (One-Dimensional Time-to-eXplosion), PODTX (ODTX with pressure-measurement), TGA (thermogravimetric analysis), and DSC (differential scanning calorimetry) experiments using varied temperature profiles. These experimental data are the basis for developing multiple reaction schemes with coupled mechanics in LLNL's multi-physics hydrocode, ALE3D (Arbitrary Lagrangian-Eulerian code in 2D and 3D). We employ evolutionary algorithms to optimize reaction rate parameters on high performance computing clusters. Once experimentally validated, this model will be scalable to a number of applications involving LX-17 and can be used to develop more sophisticated experimental methods. Furthermore, the optimization methodology developed herein should be applicable to other high explosive materials. This work was performed under the auspices of the U.S. DOE by LLNL under contract DE-AC52-07NA27344. LLNS, LLC.
Topology-Aware Performance Optimization and Modeling of Adaptive Mesh Refinement Codes for Exascale
Chan, Cy P.; Bachan, John D.; Kenny, Joseph P.; ...
2017-01-26
Here, we introduce a topology-aware performance optimization and modeling workflow for AMR simulation that includes two new modeling tools, ProgrAMR and Mota Mapper, which interface with the BoxLib AMR framework and the SSTmacro network simulator. ProgrAMR allows us to generate and model the execution of task dependency graphs from high-level specifications of AMR-based applications, which we demonstrate by analyzing two example AMR-based multigrid solvers with varying degrees of asynchrony. Mota Mapper generates multiobjective, network topology-aware box mappings, which we apply to optimize the data layout for the example multigrid solvers. While the sensitivity of these solvers to layout and executionmore » strategy appears to be modest for balanced scenarios, the impact of better mapping algorithms can be significant when performance is highly constrained by network hop latency. Furthermore, we show that network latency in the multigrid bottom solve is the main contributing factor preventing good scaling on exascale-class machines.« less
Topology-Aware Performance Optimization and Modeling of Adaptive Mesh Refinement Codes for Exascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Cy P.; Bachan, John D.; Kenny, Joseph P.
Here, we introduce a topology-aware performance optimization and modeling workflow for AMR simulation that includes two new modeling tools, ProgrAMR and Mota Mapper, which interface with the BoxLib AMR framework and the SSTmacro network simulator. ProgrAMR allows us to generate and model the execution of task dependency graphs from high-level specifications of AMR-based applications, which we demonstrate by analyzing two example AMR-based multigrid solvers with varying degrees of asynchrony. Mota Mapper generates multiobjective, network topology-aware box mappings, which we apply to optimize the data layout for the example multigrid solvers. While the sensitivity of these solvers to layout and executionmore » strategy appears to be modest for balanced scenarios, the impact of better mapping algorithms can be significant when performance is highly constrained by network hop latency. Furthermore, we show that network latency in the multigrid bottom solve is the main contributing factor preventing good scaling on exascale-class machines.« less
Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.; ...
2018-04-19
QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
CFD Analysis of Emissions for a Candidate N+3 Combustor
NASA Technical Reports Server (NTRS)
Ajmani, Kumud
2015-01-01
An effort was undertaken to analyze the performance of a model Lean-Direct Injection (LDI) combustor designed to meet emissions and performance goals for NASA's N+3 program. Computational predictions of Emissions Index (EINOx) and combustor exit temperature were obtained for operation at typical power conditions expected of a small-core, high pressure-ratio (greater than 50), high T3 inlet temperature (greater than 950K) N+3 combustor. Reacting-flow computations were performed with the National Combustion Code (NCC) for a model N+3 LDI combustor, which consisted of a nine-element LDI flame-tube derived from a previous generation (N+2) thirteen-element LDI design. A consistent approach to mesh-optimization, spraymodeling and kinetics-modeling was used, in order to leverage the lessons learned from previous N+2 flame-tube analysis with the NCC. The NCC predictions for the current, non-optimized N+3 combustor operating indicated a 74% increase in NOx emissions as compared to that of the emissions-optimized, parent N+2 LDI combustor.
CFD Analysis of Emissions for a Candidate N+3 Combustor
NASA Technical Reports Server (NTRS)
Ajmani, Kumud
2015-01-01
An effort was undertaken to analyze the performance of a model Lean-Direct Injection (LDI) combustor designed to meet emissions and performance goals for NASA's N+3 program. Computational predictions of Emissions Index (EINOx) and combustor exit temperature were obtained for operation at typical power conditions expected of a small-core, high pressure-ratio (greater than 50), high T3 inlet temperature (greater than 950K) N+3 combustor. Reacting-flow computations were performed with the National Combustion Code (NCC) for a model N+3 LDI combustor, which consisted of a nine-element LDI flame-tube derived from a previous generation (N+2) thirteen-element LDI design. A consistent approach to mesh-optimization, spray-modeling and kinetics-modeling was used, in order to leverage the lessons learned from previous N+2 flame-tube analysis with the NCC. The NCC predictions for the current, non-optimized N+3 combustor operating indicated a 74% increase in NOx emissions as compared to that of the emissions-optimized, parent N+2 LDI combustor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.
QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
NASA Astrophysics Data System (ADS)
Bird, Robert; Nystrom, David; Albright, Brian
2017-10-01
The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Optimal high- and low-thrust geocentric transfer
NASA Technical Reports Server (NTRS)
Sackett, L. L.; Edelbaum, T. N.
1974-01-01
A computer code which rapidly calculates time optimal combined high- and low-thrust transfers between two geocentric orbits in the presence of a strong gravitational field has been developed as a mission analysis tool. The low-thrust portion of the transfer can be between any two arbitrary ellipses. There is an option for including the effect of two initial high-thrust impulses which would raise the spacecraft from a low, initially circular orbit to the initial orbit for the low-thrust portion of the transfer. In addition, the effect of a single final impulse after the low-thrust portion of the transfer may be included. The total Delta V for the initial two impulses must be specified as well as the Delta V for the final impulse. Either solar electric or nuclear electric propulsion can be assumed for the low-thrust phase of the transfer.
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
Optimization of a Turboprop UAV for Maximum Loiter and Specific Power Using Genetic Algorithm
NASA Astrophysics Data System (ADS)
Dinc, Ali
2016-09-01
In this study, a genuine code was developed for optimization of selected parameters of a turboprop engine for an unmanned aerial vehicle (UAV) by employing elitist genetic algorithm. First, preliminary sizing of a UAV and its turboprop engine was done, by the code in a given mission profile. Secondly, single and multi-objective optimization were done for selected engine parameters to maximize loiter duration of UAV or specific power of engine or both. In single objective optimization, as first case, UAV loiter time was improved with an increase of 17.5% from baseline in given boundaries or constraints of compressor pressure ratio and burner exit temperature. In second case, specific power was enhanced by 12.3% from baseline. In multi-objective optimization case, where previous two objectives are considered together, loiter time and specific power were increased by 14.2% and 9.7% from baseline respectively, for the same constraints.
Compressor and Turbine Multidisciplinary Design for Highly Efficient Micro-gas Turbine
NASA Astrophysics Data System (ADS)
Barsi, Dario; Perrone, Andrea; Qu, Yonglei; Ratto, Luca; Ricci, Gianluca; Sergeev, Vitaliy; Zunino, Pietro
2018-06-01
Multidisciplinary design optimization (MDO) is widely employed to enhance turbomachinery components efficiency. The aim of this work is to describe a complete tool for the aero-mechanical design of a radial inflow turbine and a centrifugal compressor. The high rotational speed of such machines and the high exhaust gas temperature (only for the turbine) expose blades to really high stresses and therefore the aerodynamics design has to be coupled with the mechanical one through an integrated procedure. The described approach employs a fully 3D Reynolds Averaged Navier-Stokes (RANS) solver for the aerodynamics and an open source Finite Element Analysis (FEA) solver for the mechanical integrity assessment. Due to the high computational cost of both these two solvers, a meta model, such as an artificial neural network (ANN), is used to speed up the optimization design process. The interaction between two codes, the mesh generation and the post processing of the results are achieved via in-house developed scripting modules. The obtained results are widely presented and discussed.
Space Radiation Transport Code Development: 3DHZETRN
NASA Technical Reports Server (NTRS)
Wilson, John W.; Slaba, Tony C.; Badavi, Francis F.; Reddell, Brandon D.; Bahadori, Amir A.
2015-01-01
The space radiation transport code, HZETRN, has been used extensively for research, vehicle design optimization, risk analysis, and related applications. One of the simplifying features of the HZETRN transport formalism is the straight-ahead approximation, wherein all particles are assumed to travel along a common axis. This reduces the governing equation to one spatial dimension allowing enormous simplification and highly efficient computational procedures to be implemented. Despite the physical simplifications, the HZETRN code is widely used for space applications and has been found to agree well with fully 3D Monte Carlo simulations in many circumstances. Recent work has focused on the development of 3D transport corrections for neutrons and light ions (Z < 2) for which the straight-ahead approximation is known to be less accurate. Within the development of 3D corrections, well-defined convergence criteria have been considered, allowing approximation errors at each stage in model development to be quantified. The present level of development assumes the neutron cross sections have an isotropic component treated within N explicit angular directions and a forward component represented by the straight-ahead approximation. The N = 1 solution refers to the straight-ahead treatment, while N = 2 represents the bi-directional model in current use for engineering design. The figure below shows neutrons, protons, and alphas for various values of N at locations in an aluminum sphere exposed to a solar particle event (SPE) spectrum. The neutron fluence converges quickly in simple geometry with N > 14 directions. The improved code, 3DHZETRN, transports neutrons, light ions, and heavy ions under space-like boundary conditions through general geometry while maintaining a high degree of computational efficiency. A brief overview of the 3D transport formalism for neutrons and light ions is given, and extensive benchmarking results with the Monte Carlo codes Geant4, FLUKA, and PHITS are provided for a variety of boundary conditions and geometries. Improvements provided by the 3D corrections are made clear in the comparisons. Developments needed to connect 3DHZETRN to vehicle design and optimization studies will be discussed. Future theoretical development will relax the forward plus isotropic interaction assumption to more general angular dependence.
Surface interactions and high-voltage current collection
NASA Technical Reports Server (NTRS)
Mandell, M. J.; Katz, I.
1985-01-01
Spacecraft of the future will be larger and have higher power requirements than any flown to date. For several reasons, it is desirable to operate a high power system at high voltage. While the optimal voltages for many future missions are in the range 500 to 5000 volts, the highest voltage yet flown is approximately 100 volts. The NASCAP/LEO code is being developed to embody the phenomenology needed to model the environmental interactions of high voltage spacecraft. Some plasma environment are discussed. The treatment of the surface conductivity associated with emitted electrons and some simulations by NASCAP/LEO of ground based high voltage interaction experiments are described.
NASA Technical Reports Server (NTRS)
Martini, William R.
1989-01-01
A FORTRAN computer code is described that could be used to design and optimize a free-displacer, free-piston Stirling engine similar to the RE-1000 engine made by Sunpower. The code contains options for specifying displacer and power piston motion or for allowing these motions to be calculated by a force balance. The engine load may be a dashpot, inertial compressor, hydraulic pump or linear alternator. Cycle analysis may be done by isothermal analysis or adiabatic analysis. Adiabatic analysis may be done using the Martini moving gas node analysis or the Rios second-order Runge-Kutta analysis. Flow loss and heat loss equations are included. Graphical display of engine motions and pressures and temperatures are included. Programming for optimizing up to 15 independent dimensions is included. Sample performance results are shown for both specified and unconstrained piston motions; these results are shown as generated by each of the two Martini analyses. Two sample optimization searches are shown using specified piston motion isothermal analysis. One is for three adjustable input and one is for four. Also, two optimization searches for calculated piston motion are presented for three and for four adjustable inputs. The effect of leakage is evaluated. Suggestions for further work are given.
Pianowski, Giselle; Meyer, Gregory J; Villemor-Amaral, Anna Elisa de
2016-01-01
Exner ( 1989 ) and Weiner ( 2003 ) identified 3 types of Rorschach codes that are most likely to contain personally relevant projective material: Distortions, Movement, and Embellishments. We examine how often these types of codes occur in normative data and whether their frequency changes for the 1st, 2nd, 3rd, 4th, or last response to a card. We also examine the impact on these variables of the Rorschach Performance Assessment System's (R-PAS) statistical modeling procedures that convert the distribution of responses (R) from Comprehensive System (CS) administered protocols to match the distribution of R found in protocols obtained using R-optimized administration guidelines. In 2 normative reference databases, the results indicated that about 40% of responses (M = 39.25) have 1 type of code, 15% have 2 types, and 1.5% have all 3 types, with frequencies not changing by response number. In addition, there were no mean differences in the original CS and R-optimized modeled records (M Cohen's d = -0.04 in both databases). When considered alongside findings showing minimal differences between the protocols of people randomly assigned to CS or R-optimized administration, the data suggest R-optimized administration should not alter the extent to which potential projective material is present in a Rorschach protocol.
Autonomous Modelling of X-ray Spectra Using Robust Global Optimization Methods
NASA Astrophysics Data System (ADS)
Rogers, Adam; Safi-Harb, Samar; Fiege, Jason
2015-08-01
The standard approach to model fitting in X-ray astronomy is by means of local optimization methods. However, these local optimizers suffer from a number of problems, such as a tendency for the fit parameters to become trapped in local minima, and can require an involved process of detailed user intervention to guide them through the optimization process. In this work we introduce a general GUI-driven global optimization method for fitting models to X-ray data, written in MATLAB, which searches for optimal models with minimal user interaction. We directly interface with the commonly used XSPEC libraries to access the full complement of pre-existing spectral models that describe a wide range of physics appropriate for modelling astrophysical sources, including supernova remnants and compact objects. Our algorithm is powered by the Ferret genetic algorithm and Locust particle swarm optimizer from the Qubist Global Optimization Toolbox, which are robust at finding families of solutions and identifying degeneracies. This technique will be particularly instrumental for multi-parameter models and high-fidelity data. In this presentation, we provide details of the code and use our techniques to analyze X-ray data obtained from a variety of astrophysical sources.
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Yan, Jerry
1999-01-01
We present an HPF (High Performance Fortran) implementation of ARC3D code along with the profiling and performance data on SGI Origin 2000. Advantages and limitations of HPF as a parallel programming language for CFD applications are discussed. For achieving good performance results we used the data distributions optimized for implementation of implicit and explicit operators of the solver and boundary conditions. We compare the results with MPI and directive based implementations.
Control and System Theory, Optimization, Inverse and Ill-Posed Problems
1988-09-14
Justlfleatlen Distribut ion/ Availability Codes # AFOSR-87-0350 Avat’ and/or1987-1988 Dist Special *CONTROL AND SYSTEM THEORY , ~ * OPTIMIZATION, * INVERSE...considerable va- riety of research investigations within the grant areas (Control and system theory , Optimization, and Ill-posed problems]. The
NASA Astrophysics Data System (ADS)
Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John
The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.
A study of power cycles using supercritical carbon dioxide as the working fluid
NASA Astrophysics Data System (ADS)
Schroder, Andrew Urban
A real fluid heat engine power cycle analysis code has been developed for analyzing the zero dimensional performance of a general recuperated, recompression, precompression supercritical carbon dioxide power cycle with reheat and a unique shaft configuration. With the proposed shaft configuration, several smaller compressor-turbine pairs could be placed inside of a pressure vessel in order to avoid high speed, high pressure rotating seals. The small compressor-turbine pairs would share some resemblance with a turbocharger assembly. Variation in fluid properties within the heat exchangers is taken into account by discretizing zero dimensional heat exchangers. The cycle analysis code allows for multiple reheat stages, as well as an option for the main compressor to be powered by a dedicated turbine or an electrical motor. Variation in performance with respect to design heat exchanger pressure drops and minimum temperature differences, precompressor pressure ratio, main compressor pressure ratio, recompression mass fraction, main compressor inlet pressure, and low temperature recuperator mass fraction have been explored throughout a range of each design parameter. Turbomachinery isentropic efficiencies are implemented and the sensitivity of the cycle performance and the optimal design parameters is explored. Sensitivity of the cycle performance and optimal design parameters is studied with respect to the minimum heat rejection temperature and the maximum heat addition temperature. A hybrid stochastic and gradient based optimization technique has been used to optimize critical design parameters for maximum engine thermal efficiency. A parallel design exploration mode was also developed in order to rapidly conduct the parameter sweeps in this design space exploration. A cycle thermal efficiency of 49.6% is predicted with a 320K [47°C] minimum temperature and 923K [650°C] maximum temperature. The real fluid heat engine power cycle analysis code was expanded to study a theoretical recuperated Lenoir cycle using supercritical carbon dioxide as the working fluid. The real fluid cycle analysis code was also enhanced to study a combined cycle engine cascade. Two engine cascade configurations were studied. The first consisted of a traditional open loop gas turbine, coupled with a series of recuperated, recompression, precompression supercritical carbon dioxide power cycles, with a predicted combined cycle thermal efficiency of 65.0% using a peak temperature of 1,890K [1,617°C]. The second configuration consisted of a hybrid natural gas powered solid oxide fuel cell and gas turbine, coupled with a series of recuperated, recompression, precompression supercritical carbon dioxide power cycles, with a predicted combined cycle thermal efficiency of 73.1%. Both configurations had a minimum temperature of 306K [33°C]. The hybrid stochastic and gradient based optimization technique was used to optimize all engine design parameters for each engine in the cascade such that the entire engine cascade achieved the maximum thermal efficiency. The parallel design exploration mode was also utilized in order to understand the impact of different design parameters on the overall engine cascade thermal efficiency. Two dimensional conjugate heat transfer (CHT) numerical simulations of a straight, equal height channel heat exchanger using supercritical carbon dioxide were conducted at various Reynolds numbers and channel lengths.
DAKOTA Design Analysis Kit for Optimization and Terascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Brian M.; Dalbey, Keith R.; Eldred, Michael S.
2010-02-24
The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes (computational models) and iterative analysis methods. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a flexible and extensible problem-solving environment for design and analysis of computational models on high performance computers.A user provides a set of DAKOTA commands in an input file and launches DAKOTA. DAKOTA invokes instances of the computational models, collects their results, and performs systems analyses. DAKOTA contains algorithms for optimization with gradient and nongradient-basedmore » methods; uncertainty quantification with sampling, reliability, polynomial chaos, stochastic collocation, and epistemic methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as hybrid optimization, surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. Services for parallel computing, simulation interfacing, approximation modeling, fault tolerance, restart, and graphics are also included.« less
Distribution path robust optimization of electric vehicle with multiple distribution centers
Hao, Wei; He, Ruichun; Jia, Xiaoyan; Pan, Fuquan; Fan, Jing; Xiong, Ruiqi
2018-01-01
To identify electrical vehicle (EV) distribution paths with high robustness, insensitivity to uncertainty factors, and detailed road-by-road schemes, optimization of the distribution path problem of EV with multiple distribution centers and considering the charging facilities is necessary. With the minimum transport time as the goal, a robust optimization model of EV distribution path with adjustable robustness is established based on Bertsimas’ theory of robust discrete optimization. An enhanced three-segment genetic algorithm is also developed to solve the model, such that the optimal distribution scheme initially contains all road-by-road path data using the three-segment mixed coding and decoding method. During genetic manipulation, different interlacing and mutation operations are carried out on different chromosomes, while, during population evolution, the infeasible solution is naturally avoided. A part of the road network of Xifeng District in Qingyang City is taken as an example to test the model and the algorithm in this study, and the concrete transportation paths are utilized in the final distribution scheme. Therefore, more robust EV distribution paths with multiple distribution centers can be obtained using the robust optimization model. PMID:29518169
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less
Particle-gas dynamics in the protoplanetary nebula
NASA Technical Reports Server (NTRS)
Cuzzi, Jeffrey N.; Champney, Joelle M.; Dobrovolskis, Anthony R.
1991-01-01
In the past year we made significant progress in improving our fundamental understanding of the physics of particle-gas dynamics in the protoplanetary nebula. Having brought our code to a state of fairly robust functionality, we devoted significant effort to optimizing it for running long cases. We optimized the code for vectorization to the extent that it now runs eight times faster than before. The following subject areas are covered: physical improvements to the model; numerical results; Reynolds averaging of fluid equations; and modeling of turbulence and viscosity.
Connectivity Restoration in Wireless Sensor Networks via Space Network Coding.
Uwitonze, Alfred; Huang, Jiaqing; Ye, Yuanqing; Cheng, Wenqing
2017-04-20
The problem of finding the number and optimal positions of relay nodes for restoring the network connectivity in partitioned Wireless Sensor Networks (WSNs) is Non-deterministic Polynomial-time hard (NP-hard) and thus heuristic methods are preferred to solve it. This paper proposes a novel polynomial time heuristic algorithm, namely, Relay Placement using Space Network Coding (RPSNC), to solve this problem, where Space Network Coding, also called Space Information Flow (SIF), is a new research paradigm that studies network coding in Euclidean space, in which extra relay nodes can be introduced to reduce the cost of communication. Unlike contemporary schemes that are often based on Minimum Spanning Tree (MST), Euclidean Steiner Minimal Tree (ESMT) or a combination of MST with ESMT, RPSNC is a new min-cost multicast space network coding approach that combines Delaunay triangulation and non-uniform partitioning techniques for generating a number of candidate relay nodes, and then linear programming is applied for choosing the optimal relay nodes and computing their connection links with terminals. Subsequently, an equilibrium method is used to refine the locations of the optimal relay nodes, by moving them to balanced positions. RPSNC can adapt to any density distribution of relay nodes and terminals, as well as any density distribution of terminals. The performance and complexity of RPSNC are analyzed and its performance is validated through simulation experiments.
Preliminary Results from the Application of Automated Adjoint Code Generation to CFL3D
NASA Technical Reports Server (NTRS)
Carle, Alan; Fagan, Mike; Green, Lawrence L.
1998-01-01
This report describes preliminary results obtained using an automated adjoint code generator for Fortran to augment a widely-used computational fluid dynamics flow solver to compute derivatives. These preliminary results with this augmented code suggest that, even in its infancy, the automated adjoint code generator can accurately and efficiently deliver derivatives for use in transonic Euler-based aerodynamic shape optimization problems with hundreds to thousands of independent design variables.
Numerical design and analysis of parasitic mode oscillations for 95 GHz gyrotron beam tunnel
NASA Astrophysics Data System (ADS)
Kumar, Nitin; Singh, Udaybir; Yadav, Vivek; Kumar, Anil; Sinha, A. K.
2013-05-01
The beam tunnel, equipped with the high lossy ceramics, is designed for 95 GHz gyrotron. The geometry of the beam tunnel is optimized considering the maximum RF absorption (ideally 100%) and the suppression of parasitic oscillations. The excitation of parasitic modes is a concerning problem for high frequency, high power gyrotrons. Considering the problem of parasitic mode excitation in beam tunnel, a detail analysis is performed for the suppression of these kinds of modes. Trajectory code EGUN and CST Microwave Studio are used for the simulations of electron beam trajectory and electromagnetic analysis, respectively.
Dimitrov, I. K.; Zhang, X.; Solovyov, V. F.; ...
2015-07-07
Recent advances in second-generation (YBCO) high-temperature superconducting wire could potentially enable the design of super high performance energy storage devices that combine the high energy density of chemical storage with the high power of superconducting magnetic storage. However, the high aspect ratio and the considerable filament size of these wires require the concomitant development of dedicated optimization methods that account for the critical current density in type-II superconductors. In this study, we report on the novel application and results of a CPU-efficient semianalytical computer code based on the Radia 3-D magnetostatics software package. Our algorithm is used to simulate andmore » optimize the energy density of a superconducting magnetic energy storage device model, based on design constraints, such as overall size and number of coils. The rapid performance of the code is pivoted on analytical calculations of the magnetic field based on an efficient implementation of the Biot-Savart law for a large variety of 3-D “base” geometries in the Radia package. The significantly reduced CPU time and simple data input in conjunction with the consideration of realistic input variables, such as material-specific, temperature, and magnetic-field-dependent critical current densities, have enabled the Radia-based algorithm to outperform finite-element approaches in CPU time at the same accuracy levels. Comparative simulations of MgB 2 and YBCO-based devices are performed at 4.2 K, in order to ascertain the realistic efficiency of the design configurations.« less
NASA Technical Reports Server (NTRS)
Sandlin, Doral R.; Bauer, Brent Alan
1993-01-01
This paper discusses the development of a FORTRAN computer code to perform agility analysis on aircraft configurations. This code is to be part of the NASA-Ames ACSYNT (AirCraft SYNThesis) design code. This paper begins with a discussion of contemporary agility research in the aircraft industry and a survey of a few agility metrics. The methodology, techniques and models developed for the code are then presented. Finally, example trade studies using the agility module along with ACSYNT are illustrated. These trade studies were conducted using a Northrop F-20 Tigershark aircraft model. The studies show that the agility module is effective in analyzing the influence of common parameters such as thrust-to-weight ratio and wing loading on agility criteria. The module can compare the agility potential between different configurations. In addition one study illustrates the module's ability to optimize a configuration's agility performance.
Introduction of the ASGARD code (Automated Selection and Grouping of events in AIA Regional Data)
NASA Astrophysics Data System (ADS)
Bethge, Christian; Winebarger, Amy; Tiwari, Sanjiv K.; Fayock, Brian
2017-08-01
We have developed the ASGARD code to automatically detect and group brightenings ("events") in AIA data. The event selection and grouping can be optimized to the respective dataset with a multitude of control parameters. The code was initially written for IRIS data, but has since been optimized for AIA. However, the underlying algorithm is not limited to either and could be used for other data as well.Results from datasets in various AIA channels show that brightenings are reliably detected and that coherent coronal structures can be isolated by using the obtained information about the start, peak, and end times of events. We are presently working on a follow-up algorithm to automatically determine the heating and cooling timescales of coronal structures. This will be done by correlating the information from different AIA channels with different temperature responses. We will present the code and preliminary results.
NASA Astrophysics Data System (ADS)
Papior, Nick; Lorente, Nicolás; Frederiksen, Thomas; García, Alberto; Brandbyge, Mads
2017-03-01
We present novel methods implemented within the non-equilibrium Green function code (NEGF) TRANSIESTA based on density functional theory (DFT). Our flexible, next-generation DFT-NEGF code handles devices with one or multiple electrodes (Ne ≥ 1) with individual chemical potentials and electronic temperatures. We describe its novel methods for electrostatic gating, contour optimizations, and assertion of charge conservation, as well as the newly implemented algorithms for optimized and scalable matrix inversion, performance-critical pivoting, and hybrid parallelization. Additionally, a generic NEGF "post-processing" code (TBTRANS/PHTRANS) for electron and phonon transport is presented with several novelties such as Hamiltonian interpolations, Ne ≥ 1 electrode capability, bond-currents, generalized interface for user-defined tight-binding transport, transmission projection using eigenstates of a projected Hamiltonian, and fast inversion algorithms for large-scale simulations easily exceeding 106 atoms on workstation computers. The new features of both codes are demonstrated and bench-marked for relevant test systems.
Deep Learning Methods for Improved Decoding of Linear Codes
NASA Astrophysics Data System (ADS)
Nachmani, Eliya; Marciano, Elad; Lugosch, Loren; Gross, Warren J.; Burshtein, David; Be'ery, Yair
2018-02-01
The problem of low complexity, close to optimal, channel decoding of linear codes with short to moderate block length is considered. It is shown that deep learning methods can be used to improve a standard belief propagation decoder, despite the large example space. Similar improvements are obtained for the min-sum algorithm. It is also shown that tying the parameters of the decoders across iterations, so as to form a recurrent neural network architecture, can be implemented with comparable results. The advantage is that significantly less parameters are required. We also introduce a recurrent neural decoder architecture based on the method of successive relaxation. Improvements over standard belief propagation are also observed on sparser Tanner graph representations of the codes. Furthermore, we demonstrate that the neural belief propagation decoder can be used to improve the performance, or alternatively reduce the computational complexity, of a close to optimal decoder of short BCH codes.
Context-sensitive trace inlining for Java.
Häubl, Christian; Wimmer, Christian; Mössenböck, Hanspeter
2013-12-01
Method inlining is one of the most important optimizations in method-based just-in-time (JIT) compilers. It widens the compilation scope and therefore allows optimizing multiple methods as a whole, which increases the performance. However, if method inlining is used too frequently, the compilation time increases and too much machine code is generated. This has negative effects on the performance. Trace-based JIT compilers only compile frequently executed paths, so-called traces, instead of whole methods. This may result in faster compilation, less generated machine code, and better optimized machine code. In the previous work, we implemented a trace recording infrastructure and a trace-based compiler for [Formula: see text], by modifying the Java HotSpot VM. Based on this work, we evaluate the effect of trace inlining on the performance and the amount of generated machine code. Trace inlining has several major advantages when compared to method inlining. First, trace inlining is more selective than method inlining, because only frequently executed paths are inlined. Second, the recorded traces may capture information about virtual calls, which simplify inlining. A third advantage is that trace information is context sensitive so that different method parts can be inlined depending on the specific call site. These advantages allow more aggressive inlining while the amount of generated machine code is still reasonable. We evaluate several inlining heuristics on the benchmark suites DaCapo 9.12 Bach, SPECjbb2005, and SPECjvm2008 and show that our trace-based compiler achieves an up to 51% higher peak performance than the method-based Java HotSpot client compiler. Furthermore, we show that the large compilation scope of our trace-based compiler has a positive effect on other compiler optimizations such as constant folding or null check elimination.
Subsonic Aircraft With Regression and Neural-Network Approximators Designed
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Hopkins, Dale A.
2004-01-01
At the NASA Glenn Research Center, NASA Langley Research Center's Flight Optimization System (FLOPS) and the design optimization testbed COMETBOARDS with regression and neural-network-analysis approximators have been coupled to obtain a preliminary aircraft design methodology. For a subsonic aircraft, the optimal design, that is the airframe-engine combination, is obtained by the simulation. The aircraft is powered by two high-bypass-ratio engines with a nominal thrust of about 35,000 lbf. It is to carry 150 passengers at a cruise speed of Mach 0.8 over a range of 3000 n mi and to operate on a 6000-ft runway. The aircraft design utilized a neural network and a regression-approximations-based analysis tool, along with a multioptimizer cascade algorithm that uses sequential linear programming, sequential quadratic programming, the method of feasible directions, and then sequential quadratic programming again. Optimal aircraft weight versus the number of design iterations is shown. The central processing unit (CPU) time to solution is given. It is shown that the regression-method-based analyzer exhibited a smoother convergence pattern than the FLOPS code. The optimum weight obtained by the approximation technique and the FLOPS code differed by 1.3 percent. Prediction by the approximation technique exhibited no error for the aircraft wing area and turbine entry temperature, whereas it was within 2 percent for most other parameters. Cascade strategy was required by FLOPS as well as the approximators. The regression method had a tendency to hug the data points, whereas the neural network exhibited a propensity to follow a mean path. The performance of the neural network and regression methods was considered adequate. It was at about the same level for small, standard, and large models with redundancy ratios (defined as the number of input-output pairs to the number of unknown coefficients) of 14, 28, and 57, respectively. In an SGI octane workstation (Silicon Graphics, Inc., Mountainview, CA), the regression training required a fraction of a CPU second, whereas neural network training was between 1 and 9 min, as given. For a single analysis cycle, the 3-sec CPU time required by the FLOPS code was reduced to milliseconds by the approximators. For design calculations, the time with the FLOPS code was 34 min. It was reduced to 2 sec with the regression method and to 4 min by the neural network technique. The performance of the regression and neural network methods was found to be satisfactory for the analysis and design optimization of the subsonic aircraft.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Redi, M.H.; Mynick, H.E.; Suewattana, M.
Hamiltonian coordinate, guiding-center code calculations of the confinement of suprathermal ions in quasi-axisymmetric stellarator (QAS) designs have been carried out to evaluate the attractiveness of compact configurations which are optimized for ballooning stability. A new stellarator particle-following code is used to predict ion loss rates and particle confinement for thermal and neutral beam ions in a small experiment with R = 145 cm, B = 1-2 T and for alpha particles in a reactor-size device. In contrast to tokamaks, it is found that high edge poloidal flux has limited value in improving ion confinement in QAS, since collisional pitch-angle scatteringmore » drives ions into ripple wells and stochastic field regions, where they are quickly lost. The necessity for reduced stellarator ripple fields is emphasized. The high neutral beam ion loss predicted for these configurations suggests that more interesting physics could be explored with an experiment of less constrained size and magnetic field geometry.« less
A high performance scientific cloud computing environment for materials simulations
NASA Astrophysics Data System (ADS)
Jorissen, K.; Vila, F. D.; Rehr, J. J.
2012-09-01
We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
Modeling multi-GeV class laser-plasma accelerators with INF&RNO
NASA Astrophysics Data System (ADS)
Benedetti, Carlo; Schroeder, Carl; Bulanov, Stepan; Geddes, Cameron; Esarey, Eric; Leemans, Wim
2016-10-01
Laser plasma accelerators (LPAs) can produce accelerating gradients on the order of tens to hundreds of GV/m, making them attractive as compact particle accelerators for radiation production or as drivers for future high-energy colliders. Understanding and optimizing the performance of LPAs requires detailed numerical modeling of the nonlinear laser-plasma interaction. We present simulation results, obtained with the computationally efficient, PIC/fluid code INF&RNO (INtegrated Fluid & paRticle simulatioN cOde), concerning present (multi-GeV stages) and future (10 GeV stages) LPA experiments performed with the BELLA PW laser system at LBNL. In particular, we will illustrate the issues related to the guiding of a high-intensity, short-pulse, laser when a realistic description for both the laser driver and the background plasma is adopted. Work Supported by the U.S. Department of Energy under contract No. DE-AC02-05CH11231.
Conditional Entropy-Constrained Residual VQ with Application to Image Coding
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith, Mark J. T.
1996-01-01
This paper introduces an extension of entropy-constrained residual vector quantization (VQ) where intervector dependencies are exploited. The method, which we call conditional entropy-constrained residual VQ, employs a high-order entropy conditioning strategy that captures local information in the neighboring vectors. When applied to coding images, the proposed method is shown to achieve better rate-distortion performance than that of entropy-constrained residual vector quantization with less computational complexity and lower memory requirements. Moreover, it can be designed to support progressive transmission in a natural way. It is also shown to outperform some of the best predictive and finite-state VQ techniques reported in the literature. This is due partly to the joint optimization between the residual vector quantizer and a high-order conditional entropy coder as well as the efficiency of the multistage residual VQ structure and the dynamic nature of the prediction.